Restated: "Fields are expensive, records are cheap" [Tables]

Prev: header full line but page 2 rows
Next: Restated: "Fields are expensive, records are cheap"

From: David W. Fenton on 28 Feb 2010 21:37

=?Utf-8?B?RGVubmlz?= <Dennis(a)discussions.microsoft.com> wrote in
news:557B9E51-F983-483F-91D9-DFA2B8FEB700(a)microsoft.com:

> Also, as I stated in the other discussion on this subject (which I
> surprised you missed as you are commenting in that discussion
> also), I� Tve worked on 12 different vendor� Ts insurance systems
> over the years. Those system have been written DECADES apart with
> totally different technology and totally different development
> groups

I have looked at the data structures that have become common
practice for a lot of different application types and have
repeatedly seen bad designs become the norm for those creating those
types of applications.

That many people have used the same data structure doesn't make it a
good one.

Likewise, lots of applications have to support legacy applications
that can't easily handle more normalized relational structures and
thus a lot of structures that were required 15-20 years ago by the
available technology are still in use. This is quite understandable,
of course, but if you're developing a new application using modern
technology, there's no reason to maintain the old data structures
unless they are really the best model of the entities involved.

I see this kind of thing all the time. A 200-field table is just not
a good model and for Access is way too close to the 255-field limit
for me to be comfortable with it.

While it's theoretically possible that the structure is not
denormalized, I think it's extremely unlikely, and that there are
better structures to store and manipulate the very same data.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

From: David W. Fenton on 28 Feb 2010 21:44

=?Utf-8?B?RGVubmlz?= <Dennis(a)discussions.microsoft.com> wrote in
news:0112D48C-5157-4FB6-937B-AAE2CB87A260(a)microsoft.com:

> So once again, how does all this caching and b-tree traversal
> speed up the physical reading of a record that is not in memory.
> the database engine still has to go to disk or worse yet - over
> the network to get the desired record.

Well, it doesn't, but I thought your question implied that what I
was proposing was somehow going to require reading the entire table
into memory, which is patently not the case. Only as much as is
needed will be retrieved, and once retrieved that will be cached (so
that if it's needed again, there won't be another trip to
disk/server).

This is EXACTLY the way any alternative is going to work, so I just
don't see your point in disputing the question. Jet is a
random-access engine, in that it doesn't have to read the file
sequentially to get to the data, nor does it use record numbers and
offsets (as in a fixed-length record). It retrieves the minimum data
pages needed to fulfill the request, having looked up in the indexes
which are the needed the data pages. It will make the trip to disk
only once and cache the result, and go back to disk for a read only
when the record has been updated (according to the locking data,
rather than checking the data page itself).

Your question seems to me to be based on assumptions that don't take
account of how modern random-access database access works, or how
modern OS's and database engines cache and lock data.

But maybe I'm just misunderstanding your basic question.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

From: David W. Fenton on 28 Feb 2010 22:16

=?Utf-8?B?RGVubmlz?= <Dennis(a)discussions.microsoft.com> wrote in
news:72685400-255F-46E9-B79F-0BE0CBBAAE30(a)microsoft.com:

> Your comment: Schema design should be as independent of the
> database engine as possible, so Access is *not* different in any
> way, shape or form. I would recommend *as a starting point* the
> same normalized design for any database engine.
>
> Response: Your comment is self contradictory. Instead of saying
> “Schema design should be independent of the database engine”
> you stated “Schema design should be as independent of the
> database engine as possible.” The “as possible” statement
> by definition states that thing will be different between database
> engines. Which, throws you whole argument out the door.

Only in edge cases, which you don't encounter until implementation
time.

> Yes, Access is different from DB/400, and Oracle, and D3. It is
> very similar, but it is not the same.

Well, 99% of the differences between Jet/ACE (Access is not a
database) and the mentioned databases is that Jet/ACE is a
file-based database engine, and the ones I recognize (D3 is unknown
to me) are server-based databases.

> From what I've read Access had not
> implemented the entire SQL language.

There is no such thing as "entire SQL language." There are only SQL
standards, e.g., SQL 89, SQL 92, etc., and each establishes a
baseline meaning for particular SQL commands. No database engine is
required to implement 100% of a SQL spec, though from reading the
fanboy arguments about which db engine is better you'd think that
was the only thing that mattered.

Jet/ACE being a relatively old SQL implementation and predating the
first major SQL standard (SQL 89) is rather idiosyncratic in its SQL
implementation, and has a lot of legacy aspects (some of which would
be quite nice if they were implemented in other SQL dialects, e.g.,
DISTINCTROW). Jet/ACE SQL is also different in that it tends not to
support DDL as well as most other SQL dialects, but I would argue
that this is because Jet/ACE very early on got a very fine
high-level database interface, i.e., DAO, that was closely tied to
Jet/ACE and allowed full control of the database engine without
needing to write DDL SQL. There are still many things that you can
do in DAO that you can't do in Jet/ACE DDL, and not just things that
are Jet/ACE-specific.

> Also, Access does not support blobs
> very well where Oracle does.

Oh, come on. Who cares? Secondly, Oracle didn't always support BLOB
data well -- it's something that has been added into the engine as
its users needed it. Jet/ACE's BLOB support has not been as
important to its users so it doesn't perhaps support the same level
of features as Oracle's implementation.

But really, there about a million things that Oracle offers that
Jet/ACE lacks, but that's because the purposes of the two database
engines are completely different.

> From what I've read, it is highly recommended
> that we not store blobs in Access database.

Yes. And I think it's mostly advisable in other database engines,
too.

> Rather, we store the path and
> file name to the blob and let DOS/Windows store the binary file in
> the specified directory.

I think you're actually confusing the advice regarding a specific
type of BLOB field, i.e., OLE fields, and BLOB fields in general.
OLE fields have a lot of overhead that makes them problematic, and
BLOB fields lack that overhead (because they are just a bucket in
which to store binary data). OLE fields are easy to use because of
the wrapper, but are inefficient because of it. BLOB fields are more
efficient but harder to use.

And Oracle doesn't support OLE fields at all.

This is because it has a different intended user than Jet/ACE, and
that's as it should be.

> From what I've read Oracle has no problems storing
> blobs in their records..

Nor does Jet/ACE.

> I don't know if DB/400 stores blobs. I know D3
> does not store blobs, but jBase might. I know D3 support
> multi-valued list and I think Oracle does also, which are very
> useful in exploding part on a fully assembled unit (ie car door).
> Access does not support multi-valued list. So much for “Access
> *not* different in any way, shape, or form”.

You don't seem to be paying close attention, as multi-vield fields
were added in Access 2007 in the ACE in ACCDB format. This was done
for Sharepoint comatibility, not because multi-value fields are a
good idea -- they most categorically are *not* a good idea, in fact,
and support for them is not an indication of a good database engine.

> Even as a newbie, I knew that statement was false.

You're arguing against something I never asserted. You seem to have
some sort of reading comprehension problem, as it is patently
obvious that when I said this:

Schema design should be as independent of the database engine as
possible, so Access is *not* different in any way, shape or form.

....the second phrase applies to the first. That is, in regard to the
ideal independence of schema design and database engine
capabilities, Access is not different from other databases.

It is entirely perverse to divorce the second clause from the first
and insist that I was saying there were no differences of any kind
between Access/Jet/ACE and other databases. No reasonable person
would count that as a valid or good-faith interpretation of what was
written.

> Even a prima facia
> examination of that statement indicates it is false. Are you
> saying Access is not different in any way, shape of from DB/400,
> Oracle, My SQL or SQL server?

No, I didn't say that. And you know perfectly well that I didn't,
unless you are blazingly stupid.

> I give you one difference. ....

Who cares? You're off on a crazy tangent that has nothing to do with
what I wrote. I didn't make any assertion that all databases were
alike in all respects, and you know I didn't make that assertion.
Yet, took several paragraphs to provide evidence to contradict
something that was never asserted.

I deleted it. I didn't read it. I didn't need to, as it doesn't have
anything to do with what I actually assered.

> Each database engine has different capabilities, enhancements,
> different levels of SQL implementation, and limitations than the
> next. What it appears that you are saying is we should design our
> schema to the lowest common denominator and ignore any additional
> capability offered by the particular database.

No, I'm not saying that at all.

I'm saying that schema design is a logical operation, not a physical
one. You model the entities according to their characteristics and
logical relationships and then you design tables that implement
those relationships and that can store the characteristics and do
all of that in a way that maximizes data integrity and consistency.

Minor details may differ (you may choose different data types in
different engines to store a particular attribute), but the logical
structure and relationships is determined not by the database
engine, but by the characteristics and structure of the real-world
entities being modelled for the purpose of the application at hand.

The last statement has a couple of corollaries:

1. the same entity may be modelled differently in different
applications because the purposes of the applications and role of
the entity in each application may be different.

2. two different structures can be logically equivalent, but in
terms of database performance one or the other may be more efficient
because of the real-world realities of the way database engines
store and retrieve data. In some cases, the particular features of a
particular database engine may make a difference as to which
structure you choose. For instance, if you are putting business
rules in the database schema, an engine that supports triggers is
likely to be a better choice, and this will have an impact on how
you implement the structure.

Too often people think of data normalization as a process of taking
a flat table and breaking it into constituent parts. But that gets
things backwards, because you're starting from the denormalized
structure and working backwards to the normalized structure. In my
opinion, you should begin with the normalized structure and only
denormalize when the application and/or database engine forces it on
you. The design of an application is often going to force you to
denormalize some things for practical reasons (usually performance),
but that doesn't mean you start from the denormalized structure
(that would be a form of premature optimization).

> As soon as you
> move away from this position, you then have to design different
> schema for different engines.

Different in details, but usually not in terms of entity definitions
and logical structure. That is, you might have different data types
for certain fields, but the tables and their relationships will be
more or less the same. Now, I already outlined above a case where
this might not be the case, and that's the one where business rules
are incorporated into the schema design, which very often
necessitates the use of triggers. In that case, your design will
likely be very different structurally for a database that lacks
triggers as opposed to one that supports them.

Of course, modern practices deprecate the whole idea of
incorporating business rules at the engine level. The usual modern
preference is to put those in a layer between the application and
the database so that the business rules can be altered independent
of both the database structure and the application implementation.

Historically this has mostly not been possible with Access, as it's
designed around direct communcation between Access/Jet/ACE and the
database. This changes in A2010 with Sharepoint integration, because
Sharepoint segregates certain things into is own separate layers so
that things you'd normally put in the application or in the database
end up in the layer in between.

> Granted, those changes might be
> slight. But as soon you design something different for the
> different engines, you violated your statement that Access is not
> different.

This is not a statement that I made, so I really see no reason that
I should have to defend it.

--
David W. Fenton http://www.dfenton.com/
usenet at dfenton dot com http://www.dfenton.com/DFA/

First | Prev |
Pages: 1 2 3 4
Prev: header full line but page 2 rows
Next: Restated: "Fields are expensive, records are cheap"