Restated: "Fields are expensive, records are cheap" [Tables]

Prev: header full line but page 2 rows
Next: Restated: "Fields are expensive, records are cheap"

From: David W. Fenton on 25 Feb 2010 13:40

"Allen Browne" <AllenBrowne(a)SeeSig.Invalid> wrote in
news:uSKvaydtKHA.4492(a)TK2MSFTNGP05.phx.gbl:

> While there's some truth in that simple view, the reality is much
> more complex than that. Access uses pointers to store the data. If
> you define a Text field as 255 characters but enter 2 characters
> in it, it does not use up 510 bytes (unicode) of disk space to
> store those 2 characters. Internally, the database keeps track of
> where each field starts in each record. Some fields (memo, OLE,
> attachments, MVFs) are not actually stored in-line with the other
> data in the record either. So it's not a matter of a single record
> being faster to retrieve. There are many other factors at work
> here, including whether the database has been compacted recently,
> whether you are using page- or record-locking, whether the disk is
> badly fragmented, how intelligent are the various layers of
> caching (physical on the drive, or the controller, or the
> operating system, or in Access), the Rushmore algorithms in JET,
> and so on. Then we may have to start another mindset to handle SSD
> storage issues as well.

And none of this considers the issue of disk caching, such that
there's not that much difference between data loaded into RAM and
data that is being read from/written to disk, since the latter
usually takes place through the caching mechanism, and is not going
to be limited by the speed of the actual storage medium, but by RAM.
This has been the case in Windows since at least c. 1991-2, when
Windows 3.1 was released with disk caching turned on by default. It
was essential for decent performance in Windows, but it also meant
that your databases were going to be speeded up because of the disk
cache, too (although back then it was largely a read-only
improvement, as lazy writes and such had not been implemented in the
DOS disk cache; any modern version of Windows, though, caches both
reads and writes).

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

From: David W. Fenton on 25 Feb 2010 13:45

=?Utf-8?B?RGVubmlz?= <Dennis(a)discussions.microsoft.com> wrote in
news:E7D7C241-A566-4E28-8857-E6744C3C82F2(a)microsoft.com:

> My background is with large commercial insurance companies. All
> of the companies have had relational databases designed to the 3rd
> normal form.

The determination of whether or not those tables were actually
normalized depends on the chosen definition of the entity being
modelled in the table. I would consder the 200-field table you
mention later to be unquestionably denormalized, even though I know
nothing at all about the content/function of those fields. That
sounds like a table that has a bunch of fields that are used only
for a single record type, so that an auto insurance policy has one
set of fields, but a renter's insurance policy has a different set
of fields. Any time you're using some fields for some records and
not for others, it's an indication to me that the entity has been
misdefined, and should probably be broken into at least two tables,
with a narrow header table and a long child table, where each row
stores what was formerly a field in the wide table.

All that said, my conclusion could be wrong for any particular
application. But "fields are expensive, rows are cheap" is a
generalized rule of thumb, not a hard-and-fast law of nature. It
allows for exceptions for certain purposes, but is a starting point
for evaluating a schema design.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

From: David W. Fenton on 25 Feb 2010 13:49

=?Utf-8?B?RGVubmlz?= <Dennis(a)discussions.microsoft.com> wrote in
news:A72A1230-A337-4483-AB36-2E8892D7E19B(a)microsoft.com:

> I agree with comment about a field being expensive in an
> improperly normalized table structure resulting in all sorts of
> workarounds. 10 of the 30 new fields had to do with funeral
> information for a deceased member. Since all of the information
> is specific to the individual member, I included on the member
> table. You might be able to argue that it should go in it own
> table, but for data entry and reporting simplicity I included it
> my member table.

That sounds like the type of data I'd put in a separate 1:1 table,
as it only applies once a particular threshold has been crossed. A
record in that table also means the person is deceased (though you
may not have the information and might still need to store a date of
death in the main record).

I wouldn't call that denormalized, but I have always found a certain
utility in using the 1:1 table for things that apply only after the
entity has reached a certain milestone.

However, I would likely avoid having multiple 1:1 records, though,
as it then becomes complicated to get a single editable row, unless
the 1:1 records are related in a logical chain that is modelled in
the database in that way.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

From: Dennis on 25 Feb 2010 23:23

David,

I can see where disk caching would help in a sequential process, but does
disk caching really help in a randomly accessed database during data entry?

The first data entry might access the first record, the next the 1,000th
record, then next on the 5,000th record, and so on and so on. So, unless the
entire table is cached, does it really help?

Dennis

From: Dennis on 25 Feb 2010 23:27

David,

Hmmm, I see your point and kind of agree with it. My back ground in on
large and midrnage computers where it is nothing to have a 200 field, 30K
record.

However, I realize that Access is a different beast and I'm having to learn
to adjust for it restrictions. Thanks for the insight. Just more to think
about. But then I learn something new also.

Thanks,

Dennis

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: header full line but page 2 rows
Next: Restated: "Fields are expensive, records are cheap"