unexplained bianry "diff" [General Linux]

Prev: copy a file while it is actively being written
Next: Mobile Phone Compatibility = Dead End

From: Grant Edwards on 2 Feb 2010 18:43

On 2010-02-02, The Natural Philosopher <tnp(a)invalid.invalid> wrote:
> Greg Russell wrote:
>> On Tue, 02 Feb 2010 22:40:15 +0000, The Natural Philosopher wrote:
>>> Greg Russell wrote:
>>>> On Tue, 02 Feb 2010 22:12:28 +0000, Grant Edwards wrote:
>>>>
>>>>>> Will someone explain please why these files differ after I just cp'd
>>>>>> them? The file sizes are about 4.4 GB:
>>>>>>
>>>>>> $ diff ./ssc_1992_highlights.iso /tmp/ssc_1992/dvd.iso Binary files
>>>>>> ./ssc_1992_highlights.iso and /tmp/ssc_1992/dvd.iso differ
>>>>>>
>>>>>> $ cp -v /tmp/ssc_1992/dvd.iso ./ssc_1992_highlights.iso
>>>>>> `/tmp/ssc_1992/dvd.iso' -> `./ssc_1992_highlights.iso'
>>>>>>
>>>>>> $ diff ./ssc_1992_highlights.iso /tmp/ssc_1992/dvd.iso Binary files
>>>>>> ./ssc_1992_highlights.iso and /tmp/ssc_1992/dvd.iso differ
>>>>>>
>>>>>> It's very confusing to me, and doesn't help the script execution
>>>>>> we're trying to employ.
>>>>> Either the "cp" didn't work or the "diff" didn't work.
>>>>>
>>>>> What does "cmp" say about the files?
>>>> $ cmp -l ./ssc_1992_highlights.iso /tmp/ssc_1992/dvd.iso 1827673645 207
>>>> 217
>>>> 4243191341 263 273
>>>>
>>>> ... so apparently the "diff" output was correct. I can't imagine why
>>>> the cp would fail with no stderr.
>>> I don't think it has.
>>
>> Would you qualify that statement with your reasoning, please.
>>
> I'm not sure, bit I am wondering about sectors full of trash..i.e. the
> actual linked list of data stops short of what the file size is marked
> in the header block. Cos ISO's need to end on some kind of boundary.
> So, Cp might finish, then update the file header block with the size
> larger than the actual file data.

Huh? "cp" doesn't know an ISO from a JPG of your aunt LuLu's
favorite cat -- it has no concept of an ISO image header.

It's just bytes.

Besides which, cp will copy sparse files intelligently, and the
files will compare the same using either diff or cmp:

$ cat testit.py
#!/usr/bin/python
import os,sys
fd = os.open(sys.argv[1],os.O_WRONLY)
os.lseek(fd,1024*1024,0)
os.write(fd,"hi there")
os.close(fd)

$ ./testit.py foo

$ cp -v foo bar
foo' -> bar'

$ ls -l foo bar
-rw-r--r-- 1 grante users 1048586 Feb 2 17:38 bar
-rw-r--r-- 1 grante users 1048586 Feb 2 17:38 foo

$ du -h foo bar
8.0K foo
8.0K bar

$ cmp foo bar

$ diff foo bar

Even if cp didn't know about sparse files, the holes are
guaranteed to read as zeros, so the sparse versio of the file
and the "full" versio are guaranteed to compare the same:

$ cat foo >baz

$ ls -l foo baz
-rw-r--r-- 1 grante users 1048586 Feb 2 17:39 baz
-rw-r--r-- 1 grante users 1048586 Feb 2 17:38 foo

$ du -h foo baz
8.0K foo
1.1M baz

$ cmp foo baz

$ diff foo baz

> Then depending on what is in the unwritten over sectors, you will get
> different results with diff or cmp.

No, you won't.

> Cos they might actually treat the file in a slightly different
> way.

No, they won't.

> I think you can do the same thing at some levels by seeking
> past the end of a file and writing..what's in between is
> 'indeterminate'.

No, it's not. It's guaranteed to read as zeros.

> The disk subsystem in SOME systems will allocate blocks, but
> not clean them up.
>
> This is all garnered from fragments of things I have noticed
> over the years.
>
> At some subconscious level it prompted me to make that
> statement.

--
Grant Edwards grante Yow! It's a hole all the
at way to downtown Burbank!
visi.com

From: Grant Edwards on 2 Feb 2010 18:48

On 2010-02-02, The Natural Philosopher <tnp(a)invalid.invalid> wrote:
> Nuno J. Silva wrote:
>> Greg Russell <me(a)invalid.com> writes:
>>
>>> On Tue, 02 Feb 2010 22:12:28 +0000, Grant Edwards wrote:
>>>
>>>>> Will someone explain please why these files differ after I just cp'd
>>>>> them? The file sizes are about 4.4 GB:
>> [...]
>>>> Either the "cp" didn't work or the "diff" didn't work.
>>>>
>>>> What does "cmp" say about the files?
>>> $ cmp -l ./ssc_1992_highlights.iso /tmp/ssc_1992/dvd.iso
>>> 1827673645 207 217
>>> 4243191341 263 273
>>>
>>> ... so apparently the "diff" output was correct. I can't imagine why the
>>> cp would fail with no stderr.
>>
>> It should catch errors it knows about. If anything bad happens in memory
>> or in the storage device, it won't be detected.
>>
>> You might have bad memory modules (which can corrupt data stored on
>> them), a bad disk (optical, magnetic, flash, ...), or even a bad data
>> cable.
>>
> Further to my last post
>
> http://en.wikipedia.org/wiki/ISO_9660
>
> " * Level 1: File names are limited to eight characters with a
> three-character extension, using upper case letters, numbers and
> underscore only. The maximum depth of directories is eight.
[...]
> If this is similarly represented at the Linux level in some
> way, it implies that two perfectly valid ISO images could
> indeed have different bytes in the 'gaps'

Two perfectly valid ISO images can have completely different
contents, but the technical details of an ISO filesystem are
irrelevent.

We're simply dealing with files of bytes. It doesn't matter
whether those bytes can be interpreted as an ISO filesystem or
as an Esperanto version of Chaucer's Canterbury Tales.

One file was copied to a second file, and the files don't
compare. Something has failed rather badly. I'm guessing bad
RAM.

--
Grant Edwards grante Yow! FOOLED you! Absorb
at EGO SHATTERING impulse
visi.com rays, polyester poltroon!!

From: Grant Edwards on 2 Feb 2010 18:49

On 2010-02-02, Nuno J. Silva <nunojsilva(a)invalid.invalid> wrote:
> "David W. Hodgins" <dwhodgins(a)nomail.afraid.org> writes:
>
>> On Tue, 02 Feb 2010 17:31:45 -0500, Greg Russell <me(a)invalid.com> wrote:
>>
>>> ... so apparently the "diff" output was correct. I can't imagine why the
>>> cp would fail with no stderr.
>>
>> Is the dest filesystem full? Check df.
>
> In that case, won't cp say something like "No space left of device"?

Yes, in my experience.

>> Is the dest filesystem fat32, which has small file size limits?

Good suggestion.

>> Check mount.

--
Grant Edwards grante Yow! I want to so HAPPY,
at the VEINS in my neck STAND
visi.com OUT!!

From: The Natural Philosopher on 2 Feb 2010 19:50

Grant Edwards wrote:

>
> Even if cp didn't know about sparse files, the holes are
> guaranteed to read as zeros, so the sparse versio of the file
> and the "full" versio are guaranteed to compare the same:
>

where is that written?

From: Grant Edwards on 2 Feb 2010 20:50

On 2010-02-03, The Natural Philosopher <tnp(a)invalid.invalid> wrote:
> Grant Edwards wrote:
>
>>
>> Even if cp didn't know about sparse files, the holes are
>> guaranteed to read as zeros, so the sparse versio of the file
>> and the "full" versio are guaranteed to compare the same:
>
> where is that written?

It's based on the guarantee that holes read as 0x00 bytes.
If you're always guaranteed to read the same bytes from two
files, then they're going to compare the same.

--
Grant

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: copy a file while it is actively being written
Next: Mobile Phone Compatibility = Dead End