From: Roedy Green on
On Tue, 26 Jan 2010 22:17:57 -0800, "Mike Schilling"
<mscottschilling(a)hotmail.com> wrote, quoted or indirectly quoted
someone who said :

>I'm confused. How is "+" a weird character than can't be stored as
>ASCII?

+ is an odd character for filenames. It usually means concatenation.
Perhaps Phil Katz originally used some simple compression on ASCII
filenames. It has been a long time since I studied the file format.

Remember that PkZip started out as with the DOS 8.3 all
case-insensitive file system.

The way to answer these questions:

1. read spec at PkZip.com
2. read docs at WinZip.com
3. create some sample zip files and look at them with a hex editor.
4. compress and fluff some sample files and compare
attributes/timestamps.

See http://mindprod.com/jgloss/zip.html
http://mindprod.com/jgloss/pkzip.html
http://mindprod.com/jgloss/winzip.html
http://mindprod.com/jgloss/hex.html
--
Roedy Green Canadian Mind Products
http://mindprod.com
Computers are useless. They can only give you answers.
~ Pablo Picasso (born: 1881-10-25 died: 1973-04-08 at age: 91)
From: Tom Anderson on
On Tue, 26 Jan 2010, Arne Vajh?j wrote:

> On 26-01-2010 05:04, Roedy Green wrote:
>> On Mon, 25 Jan 2010 23:41:35 +0100, Erik<et57(a)hotmail.com> wrote,
>> quoted or indirectly quoted someone who said :
>>> file last modified on (0x00003c39 0x0000b52a): 2010-01-25
>>
>> One problem with ZIP format that bedevils me is that when you put a
>> file into a zip, then restore it, the timestamp can be out by up to 2
>> seconds! The restored file looks like a DIFFERENT version of the file.
>
> The format only has 5 bits for seconds.
>
> No surprise that it can be off.
>
>> Further the timestamps are in local timezone rather than GMT, and the
>> timezone is not recorded. Arrgh. I have been bugging the Winzip and
>> the Truezip people to fix this.
>>
>> Vendors are reluctant, I think, primarily because an upward compatible
>> solution would make files fatter. Archivers compete ferociously.
>
> The ZIP format is a well-defined format (defined in APPNOTE).
>
> Picking a new time format would make it not zip.
>
> And would make it unreadable by all other zip tools out there.

There is an 'extra field' in the file header record. It's structured into
tag-length-value chunks which can hold arbitrary extra metadata. Tag
0x5455 is not formally standardised, but is one of the listed "third party
mappings commonly used", and is described as "extended timestamp". You
will note that taken as a two-character ASCII string, 0x5455 is "UT". It
seems to be defined and quasi-standardised by InfoZIP; see this file from
InfoZIP hosted by your new favourite microchip manufacturer:

http://www.opensource.apple.com/source/zip/zip-6/unzip/unzip/proginfo/extra.fld

Which explains that it can contain any combination of modification,
access, and creation times, described by a bitfield, and that:

The time values are in standard Unix signed-long format, indicating the
number of seconds since 1 January 1970 00:00:00. The times are relative
to Coordinated Universal Time (UTC), also sometimes referred to as
Greenwich Mean Time (GMT).

Although looking at the InfoZIP source code, there seems to be a lot of
special-casing which suggests to me that not all tools follow those rules
to the letter.

There are also a variety of more formally standardised OS-specific
metainfo blocks, which can contain timestamps. A polyglot tool which could
read all these could provide better timestamps on extracted files even in
the absence of a 0x5455 header.

tom

--
I never meant to say that the Conservatives are generally stupid. I
meant to say that stupid people are generally Conservative. I believe
that is so obviously and universally admitted a principle that I hardly
think any gentleman will deny it. -- John Stuart Mill
From: Arne Vajhøj on
On 27-01-2010 13:39, Tom Anderson wrote:
> On Tue, 26 Jan 2010, Arne Vajh?j wrote:
>
>> On 26-01-2010 05:04, Roedy Green wrote:
>>> On Mon, 25 Jan 2010 23:41:35 +0100, Erik<et57(a)hotmail.com> wrote,
>>> quoted or indirectly quoted someone who said :
>>>> file last modified on (0x00003c39 0x0000b52a): 2010-01-25
>>>
>>> One problem with ZIP format that bedevils me is that when you put a
>>> file into a zip, then restore it, the timestamp can be out by up to 2
>>> seconds! The restored file looks like a DIFFERENT version of the file.
>>
>> The format only has 5 bits for seconds.
>>
>> No surprise that it can be off.
>>
>>> Further the timestamps are in local timezone rather than GMT, and the
>>> timezone is not recorded. Arrgh. I have been bugging the Winzip and
>>> the Truezip people to fix this.
>>>
>>> Vendors are reluctant, I think, primarily because an upward compatible
>>> solution would make files fatter. Archivers compete ferociously.
>>
>> The ZIP format is a well-defined format (defined in APPNOTE).
>>
>> Picking a new time format would make it not zip.
>>
>> And would make it unreadable by all other zip tools out there.
>
> There is an 'extra field' in the file header record. It's structured
> into tag-length-value chunks which can hold arbitrary extra metadata.
> Tag 0x5455 is not formally standardised, but is one of the listed "third
> party mappings commonly used", and is described as "extended timestamp".
> You will note that taken as a two-character ASCII string, 0x5455 is
> "UT". It seems to be defined and quasi-standardised by InfoZIP; see this
> file from InfoZIP hosted by your new favourite microchip manufacturer:
>
> http://www.opensource.apple.com/source/zip/zip-6/unzip/unzip/proginfo/extra.fld
>
>
> Which explains that it can contain any combination of modification,
> access, and creation times, described by a bitfield, and that:
>
> The time values are in standard Unix signed-long format, indicating the
> number of seconds since 1 January 1970 00:00:00. The times are relative
> to Coordinated Universal Time (UTC), also sometimes referred to as
> Greenwich Mean Time (GMT).
>
> Although looking at the InfoZIP source code, there seems to be a lot of
> special-casing which suggests to me that not all tools follow those
> rules to the letter.
>
> There are also a variety of more formally standardised OS-specific
> metainfo blocks, which can contain timestamps. A polyglot tool which
> could read all these could provide better timestamps on extracted files
> even in the absence of a 0x5455 header.

You are correct.

And extension would not break anything.

And if implementation could actually start agreeing on
using it, then it could become very useful.

Arne
From: Roedy Green on
On Wed, 27 Jan 2010 18:39:09 +0000, Tom Anderson
<twic(a)urchin.earth.li> wrote, quoted or indirectly quoted someone who
said :

> The time values are in standard Unix signed-long format, indicating the
> number of seconds since 1 January 1970 00:00:00. The times are relative
> to Coordinated Universal Time (UTC), also sometimes referred to as
> Greenwich Mean Time (GMT).

Finally, some progress. The thing that is so funny about these
problems is any one solution is trivial. The difficulty is introducing
it in a way that does not trip up other users of the files, and
persuading people to converge on a common solution. The precise
details of how it works are almost irrelevant since only a very few
programmers ever have to deal with it. Everyone else will deal with
it via a simple API.

The other problem is trying to persuade some vendor to pioneer the
feature. Vendors are reluctant to do so, even if they see the need,
because soon after a slightly different consensus scheme may be
introduced leaving them with an incompatible legacy.

I hope someone does a thesis on these sorts of problem, researching
the politics involved and how successful consensuses are reached
quickly.

Maybe the game theorists could explain the behaviours.
--
Roedy Green Canadian Mind Products
http://mindprod.com
Computers are useless. They can only give you answers.
~ Pablo Picasso (born: 1881-10-25 died: 1973-04-08 at age: 91)
From: Erik on
WinZIP itself provides this info on any zip file opened.
File->Properties->details


On Wed, 27 Jan 2010 00:58:49 -0800, Roedy Green
<see_website(a)mindprod.com.invalid> wrote:

>On Mon, 25 Jan 2010 23:41:35 +0100, Erik <et57(a)hotmail.com> wrote,
>quoted or indirectly quoted someone who said :
>
>>Some additional info from WinZIP about the Java-generated zip file:
>
>I looked all over their site but could not find that info. Did you get
>it in email?
>