From: Sven Mascheck on
Janis wrote:

> >> cat file | cmd
> >> A workaround for commands which have been compiled without
> >> largefile support but accept a pipe, e.g. compressing
> >> utilities.
>
> I am puzzled about this one. Why is it a problem for (some?)
> compression programs to read the file from a non-pipe stdin
> channel

The problem occurs e.g. if the program detects an ordinary file
and wants to seek() it (without having large file support).
Ironically the program might be able to handle a stream -
a real world example is gzip 1.2.4:

$ dd if=/dev/zero of=large seek=2G count=0 bs=1
$ ls -l large
-rw-r--r-- 1 mascheck users 2147483648 2010-02-24 20:16 large

$ gzip-1.2.4 ./large # _llseek() returns "illegal seek"
./large: Value too large for defined data type
$ gzip-1.2.4 < ./large > large.gz
gzip-1.2.4: stdin: fstat(stdin)
$ cat ./large|gzip-1.2.4 > large.gz
[... ok]
From: Sven Mascheck on
Janis Papanagnou wrote:

>> $ gzip-1.2.4 < ./large > large.gz
>
> (No option -c required?)

Doesn't help (and newer releases don't).
But I don't really understand it, either.

>> $ cat ./large|gzip-1.2.4 > large.gz
>> [... ok]
>
> Does gzip differentiate between stdin from pipe and stdin from shell
> redirection? And why?

I stumbled over the failing lseek(), but in fact this seems to be
a subsequent error only. Not gzip itself is failing, but compiling
without large file support causes libc to reject handling this file:

Although a syscall trace in all cases shows a succeeding stat() syscall,
ltrace shows that the surrounding library call fails, probably because
struct stat.st_size is just too big.

"Outsmarting" libc by using a pipe and thus hiding the actual file size
leads to stat() being called with an st_size=0, and success. This
suggests, that the actual file size (used by libc internally) is just
not strictly necessary in this case.

[guessing, on my system, only]
From: pk on
Alan Curry wrote:

> |
> |> gzip-1.2.4: stdin: fstat(stdin)
> |
> |What's the meaning of that message? What is gzip trying to achieve?
>
> It's a message that really should have included strerror(errno). What it's
> trying to achieve by fstat'ing the input file is to get the modification
> timestamp, so it can put that into the gzip header.
>
> But it failed because it used the 32-bit fstat call, which refused to
> return a struct stat with an incorrect st_size. It has no way to know that
> the caller is only interested in st_mtime.

That seems a bit odd to me. While what you say makes sense, does that mean
that in such a system, even doing a simple "ls -l" or any other command that
calls stat()/fstat() on that big file would then fail?

From: Alan Curry on
In article <1600502.3Lj2Plt8kZ(a)xkzjympik>, pk <pk(a)pk.invalid> wrote:
|Alan Curry wrote:
|>
|> But it failed because it used the 32-bit fstat call, which refused to
|> return a struct stat with an incorrect st_size. It has no way to know that
|> the caller is only interested in st_mtime.
|
|That seems a bit odd to me. While what you say makes sense, does that mean
|that in such a system, even doing a simple "ls -l" or any other command that
|calls stat()/fstat() on that big file would then fail?
|

If ls is compiled without large file support, yes it does fail on
ls -l largefile

If you ls -l a directory with some large files in it, it complains about them
and lists the ones that it can.

--
Alan Curry
From: pk on
Alan Curry wrote:

> In article <1600502.3Lj2Plt8kZ(a)xkzjympik>, pk <pk(a)pk.invalid> wrote:
> |Alan Curry wrote:
> |>
> |> But it failed because it used the 32-bit fstat call, which refused to
> |> return a struct stat with an incorrect st_size. It has no way to know
> |> that the caller is only interested in st_mtime.
> |
> |That seems a bit odd to me. While what you say makes sense, does that
> |mean that in such a system, even doing a simple "ls -l" or any other
> |command that calls stat()/fstat() on that big file would then fail?
> |
>
> If ls is compiled without large file support, yes it does fail on
> ls -l largefile
>
> If you ls -l a directory with some large files in it, it complains about
> them and lists the ones that it can.

Thanks. But then, leaving aside ls now, I don't see how replacing

command bigfile

with

cat bigfile | command

would then help if cat itself is using the 32-bit syscalls? I suppose in
that case the open() performed by cat would fail anyway.
I mean, if the system does not have large file support, and thus "command
bigfile" fails, then "cat bigfile" would be likely to fail as well.
Or is this so unlikely these days that it can be ignored?

If that is the case, then the only circumstance I can see where that may be
useful is where the system in its entirety (kernel, libc, utlities) does
have large file support, and "command" is some third-party software that was
built without large file support, so it would fail when operating on the
file directly (and "command < bigfile" wouldn't help either, as it would
still recognize its input as being a true file and try to stat()/lseek()
etc.). Then "cat bigfile | command" would be the only way to make it work
(without rebuilding it, that is).
Apparently "gzip" is (or was) such a software on some systems.