From: Lew on 10 Jul 2010 11:41 Lew wrote: >> The Developerworks article at >> <http://www.ibm.com/developerworks/web/library/wa-dbdsgn1.html> >> discusses how to pick keys and the motivations to use surrogate keys >> responsibly. They provide a balanced viewpoint, but are careful to point >> out, "As part of the plumbing, the surrogate key has no need to ever be >> visible outside the DB. In particular, it should never be revealed to >> the user. ... If a business need arises for providing the user with a >> unique identifier to a particular dataset, this identifier should be >> considered real /business/ data and kept separate from the plumbing." >> [emph. original] Arne Vajhøj wrote: > That is one of the main arguments for surrogate keys. > > The problem with real keys is if it need to change. > > Surrogate keys that are not exposed outside the database > never change. I have not spoken against surrogate keys. You appear to think that I have. What I said was that they are not part of the logical analysis but of the physical implementation. Surrogate keys properly are applied as described in the cited Developerworks article, as a *surrogate* for the natural key to handle certain implementation issues and to optimize access. The problem with not understanding the natural keys is that surrogate keys are misapplied and screw up normalization. I've seen examples in the field where misunderstanding surrogate key usage led to requiring 3- or more-way joins where a two-way join would have sufficed with proper key analysis. I see it misapplied in the OP's case, where he has no clue as to the identity that he's modeling with his cargo-cult "ID" columns. You have to understand the identity being modeled in the business domain to understand for what the key is a surrogate. There are other techniques to deal with slowly-changing natural keys than internal surrogate keys. Like most things, there are tradeoffs involved in the choices. Another mistake with surrogate keys is that they get applied to many-to-many linking tables where they are a problem. Not every table needs them. I would appreciate you providing a reference that states "the exact opposite" of my points. You yourself haven't even formulated a statement in this discussion that is the "opposite" of anything I've said, other than to vaguely claim that somehow I'm wrong. I'm not able so far to find a reference claiming the opposite of my points here, nor have I seen one in nearly thirty years of studying, implementing and using databases. Perhaps I missed it, in which case your citation will enlighten me. -- Lew
From: vlado on 10 Jul 2010 12:00 On 9.7.2010 16:32, Lew wrote: > vlado wrote: > >>> So can you please give me suggestion how the datatable(s) should look >>> like to save this form, and how >>> should entites look like ? >>> >> > Lew wrote: > >> I already gave one answer to that question, but you didn't answer >> mine. >> >> > You need to answer the questions we ask, and research the things we > suggest, else why do you bother to ask us for help? > > >> -- PostgreSQL >> CREATE TABLE garage >> ( >> garage VARCHAR(32) PRIMARY KEY, >> address TEXT, >> info TEXT >> ); >> >> CREATE TABE make >> ( >> make VARCHAR(32) PRIMARY KEY >> ); >> >> CREATE TABLE car >> ( >> car VARCHAR(32) PRIMARY KEY, >> make VARCHAR(32) NOT NULL FOREIGN KEY REFERENCES make (make) >> garage VARCHAR(32) FOREIGN KEY REFERENCES garage (garage), >> whenin DATE, >> > -- oops, I forgot > price NUMERIC (10, 2), > >> info TEXT >> ); >> >> -- no surrogate key ID needed or wanted >> >> > -- > Lew > And I wan't to answer to your questions, but with lot's of replies I can't find what the questions are, so can I write them down please ? Thanks
From: Lew on 10 Jul 2010 12:53 vlado wrote: > And I wan't to answer to your questions, but with lot's of replies I > can't find what the questions are, so can I write them down please ? I asked, "What does 'ID' represent in the problem domain?" To rephrase and perhaps make the concept clearer, what is it in the problem domain that makes one entity distinguishable from another? If there is an "ID" in the problem domain, where does it come from? If there is not an "ID" in the problem domain, for what entity attribute or collection of attributes is it a surrogate? One more question: Have you studied or even read yet any of the links regarding key selection and normalization that have been referenced in this discussion? -- Lew
From: vlado on 12 Jul 2010 02:30 On 10.7.2010 18:53, Lew wrote: > vlado wrote: >> And I wan't to answer to your questions, but with lot's of replies I >> can't find what the questions are, so can I write them down please ? > > I asked, "What does 'ID' represent in the problem domain?" > > To rephrase and perhaps make the concept clearer, what is it in the > problem domain that makes one entity distinguishable from another? > > If there is an "ID" in the problem domain, where does it come from? > > If there is not an "ID" in the problem domain, for what entity > attribute or collection of attributes is it a surrogate? > > One more question: Have you studied or even read yet any of the links > regarding key selection and normalization that have been referenced in > this discussion? > To be sensire ID in the problem domain represents nothing. It's just and indentifier to make one entity unique form another. When I persist entity in DB I'm fetching it by his database generated(surogate) ID. The problem is that my current DBA is poorly designing our database beacause we have over 500 tables and every table has ID(LONG-surogate), and niether of them doesn't use some natural primary key like you set for Car table that it will be car field, and I don't know why. That's the rason why I would use ID(surogate) in every new table that I need beacuse it is the common practise for my DBA, but it seems that he isn't doing it right :-(
From: Lew on 12 Jul 2010 03:10
vlado wrote: > To be sensire ID in the problem domain represents nothing. It's just and > indentifier to make one entity unique form another. When I persist > entity in DB I'm fetching it by his database generated(surogate) ID. The > problem is that my current DBA is poorly designing > our database beacause we have over 500 tables and every table has > ID(LONG-surogate), and niether of them doesn't use some natural primary > key like you set for Car table that it will be car field, and I don't > know why. That's the rason why > I would use ID(surogate) in every new table that I need beacuse it is > the common practise for my DBA, but it seems that he isn't doing it > right :-( As Arne pointed out earlier, there's a good argument for LONG id fields to serve as the primary key for entity tables. That doesn't mean every table should have one. Certain tables represent entities, like CAR in your example. (I took the liberty of picking my own table names here.) I'll use LONG ID columns for entities here though I won't always choose to do that every time. CREATE TABLE CAR ( ID SERIAL NOT NULL PRIMARY KEY, VIN VARCHAR (17) UNIQUE, MAKE VARCHAR (60), MODEL VARCHAR (60), YEAR INTEGER ); "MAKE" and "MODEL" can be foreign keys to like-named tables. Those referenced tables are usually lookups for valid values rather than entities in their own right. They can have ID fields, but the short ones (and most are short) can also have values directly in the columns, denorming the value into the referencing table. This saves on JOIN overhead in SELECTS that use that column. Obviously VIN is the natural key. The ID surrogate key allows changes to the natural key with less stress to inter-table relationships, if done correctly. Some SQL dialects support an enum type similar to the Java enum. Then it's a choice of how you enforce the values and whether you need extensibility. Point being that a lookup table can have a serial ID but probably doesn't need one. A linking table should not have a surrogate key. A linking table enforces structure - many-to-many relantionships between tables. It rarely needs or has more than the foreign key columns that tie other tables together. The primary key is the multi=column concatenation of the foreign key columns. The table's purpose is to support JOIN clauses, and extra ID columns interfere with that. ID columns in entity tables keep join-table keys relatively compact: CREATE TABLE cardriver -- many cars to many drivers ( car INTEGER FOREIGN KEY REFERENCES car (id), driver INTEGER FOREIGN KEY REFERENCES driver (id), PRIMARY KEY (car, driver) ); Multi-way JOIN SELECTs that go from, say, car to residence via driver but displaying columns only from car and residence can skip the driver entity table altogether and use just linking tables in the middle. SELECT car.*, res.* FROM car JOIN cardriver ON car.id = cardriver.car JOIN driverresidence dres ON cardriver.driver = dres.driver JOIN residence res on dres.residence = res.id WHERE ... -- Lew |