From: Joseph M. Newcomer on
See below...
On Thu, 14 Jan 2010 23:21:57 -0500, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Joseph M. Newcomer wrote:
>
>> Note that fread, called thousands of times, is amazingly slow in
>
> > comparison to a single ReadFile.
>
>Hmmm, both are limited to what the backend protocol driver is using
>for packet size.
>
>I would never recommend using a single ReadFile read with a large
>block such as a 100mb. Now if the file was local as a memory map, that
>would be a different story and even when we know the OS itself (by
>default) actually using memory maps when opening files, a large single
>read like 100mb IMO definitely adds design pressures. What if there
>is a failure or some condition he wants to detect during the very
>large block read?
****
In the cases I deal with, either file works, or it doesn't. If there is an error
anywhere, for any reason, the integrity of the file is suspect, and it needs to be
reconsidered.
****
>
>IOW, he now needs to make it asynchronous anyway!
****
Any time performance matters, asynchrony is good. But in the case of file systems, they
are almost always synchronous reads even when asynchronous is requested (there's a KB
article on this, KB156932, so it takes serious effort to defeat this)
****
>
>Again, I can see your design point if the file was LOCAL and knowing
>that Windows itself actually opens file in page mode. But not over a
>network. I would naturally suspect engineering issues there.
>
>> By failing to supply all the critical information, you essentially asked "Why is it that I
>> can get from city A to city B in 20 minutes, but my friend takes two hours?" and neglected
>> to mention you took the high-speed train while your friend went by bicycle.
>
>
>I didn't have a problem myself. It was obvious what the issue was when
> stating a DOS file copy was faster than his code - although it did
>take a few mail tags.
****
It would have been easier to answer if the correct data had been given in the original
question, which stated

>I can copy a file to and from my Samba network in 20
>seconds, yet it takes 80 seconds to compute its MD5 value.
>The delay is not related to the MD5 processing time because
>it can compute this in 12 seconds from the local drive.
>
>What is going on here?

Note that at no point does it suggest a DOS Copy command is being used; it could have been
copy/paste in Windows Explorer. It could have been CopyFile.
joe
****
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Alexander Grigoriev on

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
news:gvb1l5p3v1tsntfeg9fttm1ks8s1rb4a73(a)4ax.com...
>
> Illiac IV had a 2GB hard drive, the largest ever made in that era,
> mid-1960s. They wanted
> more, but that was the largest that could be produced. So for over 40
> years, it has been
> known that 2GB is at best a "modest" size.
>

Wasn't that mid-70s?


From: Hector Santos on
Hi Joe,

I don't wish to spend much time on this, but my only point was that
there is major difference when going over the network or wire using a
single read 'huge" buffer call. You got greater reliability,
performance and scalability issues to address.

--
HLS



Joseph M. Newcomer wrote:

> See below...
> On Thu, 14 Jan 2010 23:21:57 -0500, Hector Santos <sant9442(a)nospam.gmail.com> wrote:
>
>> Joseph M. Newcomer wrote:
>>
>>> Note that fread, called thousands of times, is amazingly slow in
>>> comparison to a single ReadFile.
>> Hmmm, both are limited to what the backend protocol driver is using
>> for packet size.
>>
>> I would never recommend using a single ReadFile read with a large
>> block such as a 100mb. Now if the file was local as a memory map, that
>> would be a different story and even when we know the OS itself (by
>> default) actually using memory maps when opening files, a large single
>> read like 100mb IMO definitely adds design pressures. What if there
>> is a failure or some condition he wants to detect during the very
>> large block read?
> ****
> In the cases I deal with, either file works, or it doesn't. If there is an error
> anywhere, for any reason, the integrity of the file is suspect, and it needs to be
> reconsidered.
> ****
>> IOW, he now needs to make it asynchronous anyway!
> ****
> Any time performance matters, asynchrony is good. But in the case of file systems, they
> are almost always synchronous reads even when asynchronous is requested (there's a KB
> article on this, KB156932, so it takes serious effort to defeat this)
> ****
>> Again, I can see your design point if the file was LOCAL and knowing
>> that Windows itself actually opens file in page mode. But not over a
>> network. I would naturally suspect engineering issues there.
>>
>>> By failing to supply all the critical information, you essentially asked "Why is it that I
>>> can get from city A to city B in 20 minutes, but my friend takes two hours?" and neglected
>>> to mention you took the high-speed train while your friend went by bicycle.
>>
>> I didn't have a problem myself. It was obvious what the issue was when
>> stating a DOS file copy was faster than his code - although it did
>> take a few mail tags.
> ****
> It would have been easier to answer if the correct data had been given in the original
> question, which stated
>
>> I can copy a file to and from my Samba network in 20
>> seconds, yet it takes 80 seconds to compute its MD5 value.
>> The delay is not related to the MD5 processing time because
>> it can compute this in 12 seconds from the local drive.
>>
>> What is going on here?
>
> Note that at no point does it suggest a DOS Copy command is being used; it could have been
> copy/paste in Windows Explorer. It could have been CopyFile.
> joe
> ****
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on
See below...
On Fri, 15 Jan 2010 15:55:08 -0500, Hector Santos <sant9442(a)nospam.gmail.com> wrote:

>Joseph M. Newcomer wrote:
>
>> Back In The Day, late 1980s, a friend was hired by Schumberger to modify Unix to support
>> files > 4GB. It turns out that there were the world's largest geological survey company,
>> and would do things like blow up a truckload of explosives in the middle of the Texas
>> desert, analyze the signals picked up by sensors covering several hundred square miles,
>> and come back and say "drill here for oil". As my friend told me, "you may think of 4GB
>> as a whopping lot of data; I think of it as a 3-minute sample". 2GB has been an
>> unreasonably small limit since the 1970s (ask anyone at GM who maintained the databases of
>> every car ever sold, including their manufacturing history; they were running
>> multi-gigabyte databases in the era when this involved millions of dollars of hard
>> drives). I'm amazed anyone would think 4GB as a reasonable file limit, let alone a tiny
>> value like 2GB. We live in an era of TB databases.
>
>
>
>That is a niche situation. It is not the common need. Also it was
>what it was. The limits were based on the technology of the day, and
>the practical market requirements.
>
>>
>> Illiac IV had a 2GB hard drive, the largest ever made in that era, mid-1960s. They wanted
>> more, but that was the largest that could be produced. So for over 40 years, it has been
>> known that 2GB is at best a "modest" size.
>
>
>Limits were generally based on the natural word size of the chips used.
****
Sometimes. Sometimes it was based on the word size of two words. When physical drives
had fewer bytes than a word or double-word could record, it didn't matter. By the late
1980s, this was no longer true. Disks could be truly massive, and software could
automatically make multiple physical drives look like a single drive, with files
fragmented across drives. TOPS-10, ca. 1970, could already do this. However, 80 40MB
drives were only 320MB, needing only 29 bits or so to represent file sizes on a 36-bit
machine. But by the time MS-DOS rolled around, it was not unreasonable to have massive
disks (even though they were outrageously expensive), which quickly outstripped the 16-bit
limit. This is why Windows was created with a 64-bit file length, even if they screwed it
up totally by those stupid DWORD-pair hacks.
****
>
>> The real annoyance is that you end up having to do a lot of gratuitous casts, e.g.,
>
>
>Right, which also brings the point that OOPS, VARIANT TYPES were not
>the thinking across the board. To satisfy upward compatibility, the
>programming world would have to be basically managed. Totally
>symbolic, no pointers, no need to know anything about "SIZE"
>
>That was certainly unrealistic back then, and still is today.
****
Unfortunately, thinking of writing more than 4GB is not unreasonable today, not in Win64.
It was a complete failure to retain 32-bit lengths in a 64-bit world.
joe
****
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
From: Joseph M. Newcomer on
The Illiac-IV was a mid-1960s machine. I remember learning about it in 1967, when people
who were developing the software came to CMU to give talks about what it would be like,
and by 1970 the software was already running. A friend of mine became the Illiac-IV
project manager in the early 1970s, I think around 1973, when it was already a
well-established entity and had been for several years. But I distinctly remember Dan
Slotnik giving a talk about the disk drive around 1968 or 1969.
joe

On Fri, 15 Jan 2010 19:27:26 -0800, "Alexander Grigoriev" <alegr(a)earthlink.net> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in message
>news:gvb1l5p3v1tsntfeg9fttm1ks8s1rb4a73(a)4ax.com...
>>
>> Illiac IV had a 2GB hard drive, the largest ever made in that era,
>> mid-1960s. They wanted
>> more, but that was the largest that could be produced. So for over 40
>> years, it has been
>> known that 2GB is at best a "modest" size.
>>
>
>Wasn't that mid-70s?
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm