Trying to design low level hard disk manipulation program [Computer Architecture]

Prev: "Livermore Loops" on x86 Linux
Next: How Many Processor Cores Are Enough?

From: Jan Vorbrüggen on 25 Sep 2006 10:41

> Case-blind file systems are a pox, if you've ever had to share code across
> filesystems, and your coleagues insist on saving headers in files with one
> case, but some different case appears in the source of the include
> statement...

Say what? That is just the situation a case-blind system is designed to
handle gracefully!

Jan

From: Jan Vorbrüggen on 25 Sep 2006 10:42

> Yes. There's also annoying things like ligatures and diacritics. And
> perhaps many different codepoints that (more or less) share a glyph.

How are those in any way relevant?

Jan

From: Anne & Lynn Wheeler on 25 Sep 2006 11:52

Bill Todd <billtodd(a)metrocast.net> writes:
> But it is indeed a gray area as soon as one introduces the idea of a
> CopyFile() operation (that clearly needs to include network copying
> to be of general use). The recent introduction of 'bundles'
> ('files' that are actually more like directories in terms of
> containing a hierarchical multitude of parts - considerably richer
> IIRC than IBM's old 'partitioned data sets') as a means of handling
> multi-'fork' and/or attribute-enriched files in a manner that simple
> file systems can at least store (though applications then need to
> understand that form of storage to handle it effectively) may be
> applicable here.

we had somewhat stumbled across file bundles (based on use, not
necessarily file structure organization) in the work that started out
doing traces of all record accesses for i/o cache simulation (circa
1980).

the strict cache simulation work showed that partitioned caches (aka
"local LRU") was always lower performance than global cache (aka
"global LRU"). for a fixed amount of electronic storage, a single
global system i/o cache always had better thruput than partitioning
the same amount of electronic storage between i/o channels, disk
controllers, and/or individual disks (modulo a track cache for
rotational delay compensation).

further work on the full record access traces started to show up some
amount of repeated patterns that tended to access the same collection
of files. for this collection of data access patterns, rather than
disk arm motion with various kinds of distribution ... there was very
strong bursty locality. this led down the path of maintaining more
detailed information about files and their useage for optimizing
thruput (and layout).

earlier at the science center
http://www.garlic.com/~lynn/subtopic.html#545

we had done detailed page reference traces and cluster analysis in
support of semi-automated program reorganization ... which was
eventually released as VS/REPACK product. the disk record i/o traces
started down the path of doing something similar for filesystem
organization/optimization.

i had done a backup/archive system that was used internally at a
number of locations. this eventually morphed into product called
workstation datasave facility and then adsm. it was later renamed tsm
(tivoli storage manager). this now supports bundles/containers for
file storage management (i.e. collections of files that tend to have
bursty locality of reference patterns)
http://www.garlic.com/~lynn/subtopic.html#backup

some number of other backup/archive and/or (hierarchical) storage
management systems now also have similar constructs.

some recent posts that mention that i/o cache simulation work
http://www.garlic.com/~lynn/2006e.html#45 using 3390 mod-9s
http://www.garlic.com/~lynn/2006f.html#0 using 3390 mod-9s
http://www.garlic.com/~lynn/2006f.html#18 how much swap size did you take?
http://www.garlic.com/~lynn/2006i.html#36 virtual memory
http://www.garlic.com/~lynn/2006i.html#41 virtual memory
http://www.garlic.com/~lynn/2006j.html#7 virtual memory
http://www.garlic.com/~lynn/2006j.html#14 virtual memory
http://www.garlic.com/~lynn/2006j.html#27 virtual memory
http://www.garlic.com/~lynn/2006l.html#43 One or two CPUs - the pros & cons
http://www.garlic.com/~lynn/2006o.html#27 oops
http://www.garlic.com/~lynn/2006o.html#68 DASD Response Time (on antique 3390?)
http://www.garlic.com/~lynn/2006p.html#0 DASD Response Time (on antique 3390?)

some recent posts mentioning vs/repack activity
http://www.garlic.com/~lynn/2006b.html#15 {SPAM?} Re: Expanded Storage
http://www.garlic.com/~lynn/2006b.html#23 Seeking Info on XDS Sigma 7 APL
http://www.garlic.com/~lynn/2006e.html#20 About TLB in lower-level caches
http://www.garlic.com/~lynn/2006e.html#46 using 3390 mod-9s
http://www.garlic.com/~lynn/2006i.html#37 virtual memory
http://www.garlic.com/~lynn/2006j.html#18 virtual memory
http://www.garlic.com/~lynn/2006j.html#22 virtual memory
http://www.garlic.com/~lynn/2006j.html#24 virtual memory
http://www.garlic.com/~lynn/2006l.html#11 virtual memory
http://www.garlic.com/~lynn/2006o.html#23 Strobe equivalents
http://www.garlic.com/~lynn/2006o.html#26 Cache-Size vs Performance

From: Terje Mathisen on 25 Sep 2006 12:06

Jan Vorbr?ggen wrote:
>> BT> Out of curiosity, does anyone know of a good reason why file names
>> BT> should *ever* be case-sensitive (aside from the fact that Unix
>> BT> users and applications have become used to this)?
>>
>> Which language do you want to be case-insensitive in? What if two
>> users of the same file system disagree on the choice?
>
> That is not a matter of language. Or is there a character encoding that
> says for language A, "X" and "x" are a pair while for language B, "X" and
> "y" are a pair?

Yes, afaik:

The German 'double-s' is two letters in uppercase and a single letter in
lowercase.

> Case-blind case-preserving is the only variant which is acceptable from the
> point of view of ergonomics, IMNSHO.

There I agree. This obeys the principle of least surprise, but as noted
above, it does still have drawbacks.

Terje

--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

From: Terje Mathisen on 25 Sep 2006 12:14

Andrew Reilly wrote:
> On Mon, 25 Sep 2006 12:52:48 +0200, Terje Mathisen wrote:
>
>> For every N MB of contiguous disk space, use an extra MB to store ECC
>> info for the current block. The block size needs to be large enough that
>> a local soft spot which straddles two sectors cannot overwhelm the ECC
>> coding.
>
> Isn't that just the same as having the drive manufacturer use longer
> reed-solomon (forward error correcting) codes? Errors at that level are

No, because you need _huge_ lengths to avoid the problem where an areas
of the disk is going bad. This really interferes with random access
benchmark numbers. :-(

> something that can be dialed-in or out by the manufacturer. If it's too
> high for comfort, they'll start to lose sales, won't they?
>
> Alternative approach to ECC sectors: store files in a fountain code
> pattern?

Pointers?

OK, I found some papers, but they really didn't tell my how/why they
would be suited to disk sector recovery. :-(

Terje

--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Prev: "Livermore Loops" on x86 Linux
Next: How Many Processor Cores Are Enough?