From: Kenny McCormack on 2 Apr 2010 08:14 In article <2056697.Hz00ifERbk(a)xkzjympik>, pk <pk(a)pk.invalid> wrote: >Hongyi Zhao wrote: > >> 2- If do the the following things: >> >> $ echo aa > file1 >> $ echo bb > file2 >> $ awk 'NR==FNR{a[$0]++} NR>FNR&&!a[$0]' file1 file2 >> bb >> >> $ awk 'NR==FNR{a[$0]++} NR>FNR&&!a[$0]' file1 file2 | sort -u > file2 >> $ cat file2 >> bb >> >> This time, the operation will be finished successfully. >> >> Any hints on this issue? > >Luck. Exactly. And this is the true idea of why the CLC guys get so uppity about "UB" (undefined behavior). This is the sort of situation where something that works most of the time (just because of luck), is assumed to be working by design. I've also seen posts where people put a 'sleep' command in there, in order to get the delay needed (see below). Again, this is something that works most of the time, but is never guaranteed to work. This is not, of course, to say that it isn't a clever hack. Something like: ... | (sleep 5;cat > oneofmyinputfiles) The problem, of course, is that there's no way to be sure of what number to put in (for the sleep duration). -- (This discussion group is about C, ...) Wrong. It is only OCCASIONALLY a discussion group about C; mostly, like most "discussion" groups, it is off-topic Rorsharch revelations of the childhood traumas of the participants...
From: pk on 2 Apr 2010 08:28 Kenny McCormack wrote: > In article <2056697.Hz00ifERbk(a)xkzjympik>, pk <pk(a)pk.invalid> wrote: >>Hongyi Zhao wrote: >> >>> 2- If do the the following things: >>> >>> $ echo aa > file1 >>> $ echo bb > file2 >>> $ awk 'NR==FNR{a[$0]++} NR>FNR&&!a[$0]' file1 file2 >>> bb >>> >>> $ awk 'NR==FNR{a[$0]++} NR>FNR&&!a[$0]' file1 file2 | sort -u > file2 >>> $ cat file2 >>> bb >>> >>> This time, the operation will be finished successfully. >>> >>> Any hints on this issue? >> >>Luck. > > Exactly. And this is the true idea of why the CLC guys get so uppity > about "UB" (undefined behavior). This is the sort of situation where > something that works most of the time (just because of luck), is assumed > to be working by design. > > I've also seen posts where people put a 'sleep' command in there, in > order to get the delay needed (see below). Again, this is something > that works most of the time, but is never guaranteed to work. This is > not, of course, to say that it isn't a clever hack. > > Something like: ... | (sleep 5;cat > oneofmyinputfiles) > The problem, of course, is that there's no way to be sure of what number > to put in (for the sleep duration). The problem with using sleep and a pipe is that it can still go wrong, no matter how many seconds you specify. Let's assume you're trying to do something like somecommand < file | ( sleep 10; cat > file ) Now, the pipe can only contain so much data, 64K bytes in many cases. Now if "somecommand" isn't particularly smart, and "file" is bigger than 64K, what may happen is that the pipe gets full (because sleep is still running), and thus writes performed by "somecommand" block, which in turn block the whole command and prevent it from reading further lines from "file". The whole thing stays in that state until the sleep ends, at which point anything can happen, depending on what kicks in first. I suppose you might either end up with writing only a pipe's worth of data to the file, or starting a self-feeding endless loop.
From: Mark Hobley on 2 Apr 2010 08:52 Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote: > Any hints on this issue? You cannot use a round bobbin in a Unix shell. If the input file goes above a certain size (which is quite small) it will become truncated, before it is read. http://markhobley.yi.org/shell/solutions/bobbin.html I am told that there is a tool called "buffer" which is part of the brlcad suite, which can be added as a bobbin between the input and the output, allowing this to be done. Someone offered to repackage this as a separate tool once, but I never got round to following that through. It would be useful for this to be split off from the main sute though. I think I raised a request with the upstream package maintainers to split this off, but they would not do it. (It might be worth trying again though. There seems to be problems getting brlcad into mainstream distros, and I think the maintainers may have changed since I made the original request. Maybe the new ones are more cooperative. If not, you can always take the source code, and split it yourself :) I am always interested in seeing bundles becoming split. Mark. -- Mark Hobley Linux User: #370818 http://markhobley.yi.org/
From: Ed Morton on 2 Apr 2010 09:33 On 4/1/2010 10:23 PM, Hongyi Zhao wrote: > Hi all, > > I use the following code to obtain the lines existing file2 but not in > file1, and then store the results into file2 as follows: > > awk 'NR==FNR{a[$0]++} NR>FNR&&!a[$0]' file1 file2> file2 > > I've a question about the above operation: does the file2 will be > exposed to read/write conflict issue in this case? In detail, when we > redirect the result into file2, it also as the input file for the > awk's manipulation. > > Any hints on this issue? You got your answer, so hopefully it's clear now that you don't ever want to direct your output to the same file you're reading. You can do this instead: cmd file > tmp && mv tmp file wrt your script, though, it'd more commonly be written using "next" than comparing line numbers twice, e.g.: awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2 > tmp && mv tmp file2 Regards, Ed.
From: Eric on 2 Apr 2010 09:26 On 2010-04-02, pk <pk(a)pk.invalid> wrote: > Kenny McCormack wrote: > >> In article <2056697.Hz00ifERbk(a)xkzjympik>, pk <pk(a)pk.invalid> wrote: >>>Hongyi Zhao wrote: >>> >>>> 2- If do the the following things: >>>> >>>> $ echo aa > file1 >>>> $ echo bb > file2 >>>> $ awk 'NR==FNR{a[$0]++} NR>FNR&&!a[$0]' file1 file2 >>>> bb >>>> >>>> $ awk 'NR==FNR{a[$0]++} NR>FNR&&!a[$0]' file1 file2 | sort -u > file2 >>>> $ cat file2 >>>> bb >>>> >>>> This time, the operation will be finished successfully. >>>> >>>> Any hints on this issue? >>> >>>Luck. >> >> Exactly. And this is the true idea of why the CLC guys get so uppity >> about "UB" (undefined behavior). This is the sort of situation where >> something that works most of the time (just because of luck), is assumed >> to be working by design. >> >> I've also seen posts where people put a 'sleep' command in there, in >> order to get the delay needed (see below). Again, this is something >> that works most of the time, but is never guaranteed to work. This is >> not, of course, to say that it isn't a clever hack. >> >> Something like: ... | (sleep 5;cat > oneofmyinputfiles) >> The problem, of course, is that there's no way to be sure of what number >> to put in (for the sleep duration). > > The problem with using sleep and a pipe is that it can still go wrong, no > matter how many seconds you specify. > > Let's assume you're trying to do something like > > somecommand < file | ( sleep 10; cat > file ) > > Now, the pipe can only contain so much data, 64K bytes in many cases. Now if > "somecommand" isn't particularly smart, and "file" is bigger than 64K, what > may happen is that the pipe gets full (because sleep is still running), and > thus writes performed by "somecommand" block, which in turn block the whole > command and prevent it from reading further lines from "file". > The whole thing stays in that state until the sleep ends, at which point > anything can happen, depending on what kicks in first. I suppose you might > either end up with writing only a pipe's worth of data to the file, or > starting a self-feeding endless loop. The basic answer is "Don't do that". Write to file3 instead of file2, then mv file3 file2 as the next (separate) command. The problem is that the final outcome depends on the timing of starting new processes, opening files, and opening files for re-direction. The order in which the various steps are done depends on which shell you are using, and the timing depends on how the OS kernel handles context switching as well as how long each step takes. The size of the file will have an impact, as will the way the OS treats a "truncate-and-write" open for a file that another process has open for reading (Unixes don't care, in general). The last sample somecommand < file | ( sleep 10; cat > file ) will be different from somecommand file | ( sleep 10; cat > file ) and will also depend on whether the shell starts a subprocess for the bracketed commands (which then starts process for sleep and cat) or just sets up an internal context. Too many variations, even if you _know_ how your shell and OS behave there will still be timing variations, so, once again: **Don't do that!** Eric
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 Prev: AWK - Stop processing Next: bash: how to transfer some non-ascii code |