Prev: How do I search for multiple occurrences of something on the same line?
Next: Useful use of cat? (was Re: This Week's Useless Use of Cat Awardgoes ?to...)
From: Sven Mascheck on 24 Feb 2010 14:25 Janis wrote: > >> cat file | cmd > >> A workaround for commands which have been compiled without > >> largefile support but accept a pipe, e.g. compressing > >> utilities. > > I am puzzled about this one. Why is it a problem for (some?) > compression programs to read the file from a non-pipe stdin > channel The problem occurs e.g. if the program detects an ordinary file and wants to seek() it (without having large file support). Ironically the program might be able to handle a stream - a real world example is gzip 1.2.4: $ dd if=/dev/zero of=large seek=2G count=0 bs=1 $ ls -l large -rw-r--r-- 1 mascheck users 2147483648 2010-02-24 20:16 large $ gzip-1.2.4 ./large # _llseek() returns "illegal seek" ./large: Value too large for defined data type $ gzip-1.2.4 < ./large > large.gz gzip-1.2.4: stdin: fstat(stdin) $ cat ./large|gzip-1.2.4 > large.gz [... ok]
From: Sven Mascheck on 24 Feb 2010 17:37 Janis Papanagnou wrote: >> $ gzip-1.2.4 < ./large > large.gz > > (No option -c required?) Doesn't help (and newer releases don't). But I don't really understand it, either. >> $ cat ./large|gzip-1.2.4 > large.gz >> [... ok] > > Does gzip differentiate between stdin from pipe and stdin from shell > redirection? And why? I stumbled over the failing lseek(), but in fact this seems to be a subsequent error only. Not gzip itself is failing, but compiling without large file support causes libc to reject handling this file: Although a syscall trace in all cases shows a succeeding stat() syscall, ltrace shows that the surrounding library call fails, probably because struct stat.st_size is just too big. "Outsmarting" libc by using a pipe and thus hiding the actual file size leads to stat() being called with an st_size=0, and success. This suggests, that the actual file size (used by libc internally) is just not strictly necessary in this case. [guessing, on my system, only]
From: pk on 24 Feb 2010 18:31 Alan Curry wrote: > | > |> gzip-1.2.4: stdin: fstat(stdin) > | > |What's the meaning of that message? What is gzip trying to achieve? > > It's a message that really should have included strerror(errno). What it's > trying to achieve by fstat'ing the input file is to get the modification > timestamp, so it can put that into the gzip header. > > But it failed because it used the 32-bit fstat call, which refused to > return a struct stat with an incorrect st_size. It has no way to know that > the caller is only interested in st_mtime. That seems a bit odd to me. While what you say makes sense, does that mean that in such a system, even doing a simple "ls -l" or any other command that calls stat()/fstat() on that big file would then fail?
From: Alan Curry on 24 Feb 2010 18:47 In article <1600502.3Lj2Plt8kZ(a)xkzjympik>, pk <pk(a)pk.invalid> wrote: |Alan Curry wrote: |> |> But it failed because it used the 32-bit fstat call, which refused to |> return a struct stat with an incorrect st_size. It has no way to know that |> the caller is only interested in st_mtime. | |That seems a bit odd to me. While what you say makes sense, does that mean |that in such a system, even doing a simple "ls -l" or any other command that |calls stat()/fstat() on that big file would then fail? | If ls is compiled without large file support, yes it does fail on ls -l largefile If you ls -l a directory with some large files in it, it complains about them and lists the ones that it can. -- Alan Curry
From: pk on 25 Feb 2010 10:41
Alan Curry wrote: > In article <1600502.3Lj2Plt8kZ(a)xkzjympik>, pk <pk(a)pk.invalid> wrote: > |Alan Curry wrote: > |> > |> But it failed because it used the 32-bit fstat call, which refused to > |> return a struct stat with an incorrect st_size. It has no way to know > |> that the caller is only interested in st_mtime. > | > |That seems a bit odd to me. While what you say makes sense, does that > |mean that in such a system, even doing a simple "ls -l" or any other > |command that calls stat()/fstat() on that big file would then fail? > | > > If ls is compiled without large file support, yes it does fail on > ls -l largefile > > If you ls -l a directory with some large files in it, it complains about > them and lists the ones that it can. Thanks. But then, leaving aside ls now, I don't see how replacing command bigfile with cat bigfile | command would then help if cat itself is using the 32-bit syscalls? I suppose in that case the open() performed by cat would fail anyway. I mean, if the system does not have large file support, and thus "command bigfile" fails, then "cat bigfile" would be likely to fail as well. Or is this so unlikely these days that it can be ignored? If that is the case, then the only circumstance I can see where that may be useful is where the system in its entirety (kernel, libc, utlities) does have large file support, and "command" is some third-party software that was built without large file support, so it would fail when operating on the file directly (and "command < bigfile" wouldn't help either, as it would still recognize its input as being a true file and try to stat()/lseek() etc.). Then "cat bigfile | command" would be the only way to make it work (without rebuilding it, that is). Apparently "gzip" is (or was) such a software on some systems. |