From: Lew on
Lew wrote:
>> The Developerworks article at
>> <http://www.ibm.com/developerworks/web/library/wa-dbdsgn1.html>
>> discusses how to pick keys and the motivations to use surrogate keys
>> responsibly. They provide a balanced viewpoint, but are careful to point
>> out, "As part of the plumbing, the surrogate key has no need to ever be
>> visible outside the DB. In particular, it should never be revealed to
>> the user. ... If a business need arises for providing the user with a
>> unique identifier to a particular dataset, this identifier should be
>> considered real /business/ data and kept separate from the plumbing."
>> [emph. original]

Arne Vajhøj wrote:
> That is one of the main arguments for surrogate keys.
>
> The problem with real keys is if it need to change.
>
> Surrogate keys that are not exposed outside the database
> never change.

I have not spoken against surrogate keys. You appear to think that I have.
What I said was that they are not part of the logical analysis but of the
physical implementation. Surrogate keys properly are applied as described in
the cited Developerworks article, as a *surrogate* for the natural key to
handle certain implementation issues and to optimize access.

The problem with not understanding the natural keys is that surrogate keys are
misapplied and screw up normalization. I've seen examples in the field where
misunderstanding surrogate key usage led to requiring 3- or more-way joins
where a two-way join would have sufficed with proper key analysis. I see it
misapplied in the OP's case, where he has no clue as to the identity that he's
modeling with his cargo-cult "ID" columns. You have to understand the
identity being modeled in the business domain to understand for what the key
is a surrogate.

There are other techniques to deal with slowly-changing natural keys than
internal surrogate keys. Like most things, there are tradeoffs involved in
the choices.

Another mistake with surrogate keys is that they get applied to many-to-many
linking tables where they are a problem. Not every table needs them.

I would appreciate you providing a reference that states "the exact opposite"
of my points. You yourself haven't even formulated a statement in this
discussion that is the "opposite" of anything I've said, other than to vaguely
claim that somehow I'm wrong. I'm not able so far to find a reference
claiming the opposite of my points here, nor have I seen one in nearly thirty
years of studying, implementing and using databases. Perhaps I missed it, in
which case your citation will enlighten me.

--
Lew
From: vlado on
On 9.7.2010 16:32, Lew wrote:
> vlado wrote:
>
>>> So can you please give me suggestion how the datatable(s) should look
>>> like to save this form, and how
>>> should entites look like ?
>>>
>>
> Lew wrote:
>
>> I already gave one answer to that question, but you didn't answer
>> mine.
>>
>>
> You need to answer the questions we ask, and research the things we
> suggest, else why do you bother to ask us for help?
>
>
>> -- PostgreSQL
>> CREATE TABLE garage
>> (
>> garage VARCHAR(32) PRIMARY KEY,
>> address TEXT,
>> info TEXT
>> );
>>
>> CREATE TABE make
>> (
>> make VARCHAR(32) PRIMARY KEY
>> );
>>
>> CREATE TABLE car
>> (
>> car VARCHAR(32) PRIMARY KEY,
>> make VARCHAR(32) NOT NULL FOREIGN KEY REFERENCES make (make)
>> garage VARCHAR(32) FOREIGN KEY REFERENCES garage (garage),
>> whenin DATE,
>>
> -- oops, I forgot
> price NUMERIC (10, 2),
>
>> info TEXT
>> );
>>
>> -- no surrogate key ID needed or wanted
>>
>>
> --
> Lew
>
And I wan't to answer to your questions, but with lot's of replies
I can't find what the questions are, so can I write them down please ?

Thanks
From: Lew on
vlado wrote:
> And I wan't to answer to your questions, but with lot's of replies I
> can't find what the questions are, so can I write them down please ?

I asked, "What does 'ID' represent in the problem domain?"

To rephrase and perhaps make the concept clearer, what is it in the problem
domain that makes one entity distinguishable from another?

If there is an "ID" in the problem domain, where does it come from?

If there is not an "ID" in the problem domain, for what entity attribute or
collection of attributes is it a surrogate?

One more question: Have you studied or even read yet any of the links
regarding key selection and normalization that have been referenced in this
discussion?

--
Lew
From: vlado on
On 10.7.2010 18:53, Lew wrote:
> vlado wrote:
>> And I wan't to answer to your questions, but with lot's of replies I
>> can't find what the questions are, so can I write them down please ?
>
> I asked, "What does 'ID' represent in the problem domain?"
>
> To rephrase and perhaps make the concept clearer, what is it in the
> problem domain that makes one entity distinguishable from another?
>
> If there is an "ID" in the problem domain, where does it come from?
>
> If there is not an "ID" in the problem domain, for what entity
> attribute or collection of attributes is it a surrogate?
>
> One more question: Have you studied or even read yet any of the links
> regarding key selection and normalization that have been referenced in
> this discussion?
>

To be sensire ID in the problem domain represents nothing. It's
just and indentifier to make one entity unique form another. When I
persist entity in DB I'm fetching it by his database generated(surogate)
ID. The problem is that my current DBA is poorly designing
our database beacause we have over 500 tables and every table has
ID(LONG-surogate), and niether of them doesn't use some natural primary
key like you set for Car table that it will be car field, and I don't
know why. That's the rason why
I would use ID(surogate) in every new table that I need beacuse it is
the common practise for my DBA, but it seems that he isn't doing it
right :-(


From: Lew on
vlado wrote:
> To be sensire ID in the problem domain represents nothing. It's just and
> indentifier to make one entity unique form another. When I persist
> entity in DB I'm fetching it by his database generated(surogate) ID. The
> problem is that my current DBA is poorly designing
> our database beacause we have over 500 tables and every table has
> ID(LONG-surogate), and niether of them doesn't use some natural primary
> key like you set for Car table that it will be car field, and I don't
> know why. That's the rason why
> I would use ID(surogate) in every new table that I need beacuse it is
> the common practise for my DBA, but it seems that he isn't doing it
> right :-(

As Arne pointed out earlier, there's a good argument for LONG id fields to
serve as the primary key for entity tables.

That doesn't mean every table should have one.

Certain tables represent entities, like CAR in your example. (I took the
liberty of picking my own table names here.) I'll use LONG ID columns for
entities here though I won't always choose to do that every time.

CREATE TABLE CAR
(
ID SERIAL NOT NULL PRIMARY KEY,
VIN VARCHAR (17) UNIQUE,
MAKE VARCHAR (60),
MODEL VARCHAR (60),
YEAR INTEGER
);

"MAKE" and "MODEL" can be foreign keys to like-named tables. Those referenced
tables are usually lookups for valid values rather than entities in their own
right. They can have ID fields, but the short ones (and most are short) can
also have values directly in the columns, denorming the value into the
referencing table. This saves on JOIN overhead in SELECTS that use that column.

Obviously VIN is the natural key. The ID surrogate key allows changes to the
natural key with less stress to inter-table relationships, if done correctly.

Some SQL dialects support an enum type similar to the Java enum. Then it's a
choice of how you enforce the values and whether you need extensibility.

Point being that a lookup table can have a serial ID but probably doesn't need
one.

A linking table should not have a surrogate key. A linking table enforces
structure - many-to-many relantionships between tables. It rarely needs or
has more than the foreign key columns that tie other tables together. The
primary key is the multi=column concatenation of the foreign key columns. The
table's purpose is to support JOIN clauses, and extra ID columns interfere
with that.

ID columns in entity tables keep join-table keys relatively compact:

CREATE TABLE cardriver -- many cars to many drivers
(
car INTEGER FOREIGN KEY REFERENCES car (id),
driver INTEGER FOREIGN KEY REFERENCES driver (id),
PRIMARY KEY (car, driver)
);

Multi-way JOIN SELECTs that go from, say, car to residence via driver but
displaying columns only from car and residence can skip the driver entity
table altogether and use just linking tables in the middle.

SELECT car.*, res.* FROM car
JOIN cardriver ON car.id = cardriver.car
JOIN driverresidence dres ON cardriver.driver = dres.driver
JOIN residence res on dres.residence = res.id
WHERE ...

--
Lew