From: gareth erskine-jones on 11 Jun 2010 21:10 On Sun, 06 Jun 2010 22:32:58 GMT, "Geoff Schaller" <geoffx(a)softxwareobjectives.com.au> wrote: Celko's manner certainly seems to annoy many people... but I've found his books to be very useful, and so find it hard to join in with the joe-bashing... In this case, other people have given the OP practical solutions - the actual syntax needed to remove the offending lines. Also, it's been pointed out that situations like this do in fact occur in the real world - duplicates in relational databases do not make sense ( the set { A, A } is the same as the set { A } ), but often data has to be imported from files and those files do have duplicates. Nonetheless, Celko does have a point. The OP said, >What I need to do is to delete one of the rows that has itemnum = 1 and >firstname = 'keith'. I don't care which one is deleted. It does NOT matter. >The resultant table needs to have itemnum and firstname unique together. So he cares about itemnum and firstname, but not about the notes column. The first solution given, eliminates the duplicates randomly - that is, which row is eliminated depends on the order the rows happen to be returned. The second solution, eliminates duplicates based on the value of the notes column. A note of "foo" will always be removed in favour of "bar". I agree absolutely that in the "real world" this sort of thing happens. Surely though, when importing the data, and insisting on things like the itemnum and firstname uniquely identifying a row, we generally want to be able to say, "this is good data" as opposed to saying, "the data in the notes column is more or less random". If the OP does not care about the contents of the notes column, then the simplest solution is to lose it: select distinct itemnum, firstname, '' as notes from ... This is perfectly practical, and unlike the first solution suggested, it's well defined, and unlike the second, doesn't favour comments beginning with 'a' over those beginning with 'b'. Another alternative, if the notes column does contain valuable information, is to concatenate the contents, preserving it all. I really don't know - perhaps he is. Perhaps not. I recently worked on an application where the first of the solutions (basically pick one of the rows arbitarily was used. I rewrote some queries to improve performance, and found I had a different results set, which made things a lot harder to test, and also hard to explain to the customer. This is the point when you say to the customer - look, we have these two rows, different apart from the "notes" column, which should we use? Then the customer can think about whether he really means it when he says he doesn't care about which one is selected. Certainly you should prefer a solution like Erland's, which is deterministic and isn't going to change with a service pack release or something. .. >Hmmmm.... exposing your inexperience again? .. If someone asks "how do I do X?", it is, occasionally, appropriate to say, "You don't want to do X, You actually want to do Y". This involves telling someone that they're wrong, which can be done very gently or otherwise. Celko generally does it "otherwise", but he is often right.... GSEJ > >Out here in the real world there are actually quite a lot of >circumstances where duplicates are quite natural. One is an auctioning >style system. Multiple entries are created against the same item but at >some point, competing values are removed, resulting in one unique >transaction. Another is a data matching service. For example name >cleaning. You start with 20 'similar' names. The process discovers that >15 are the same and renders common fields common and determines the >common 'key'. The duplicates are removed. Haven't you heard of de-duping >processes? > >Geoff > > > >"--CELKO--" <jcelko212(a)earthlink.net> wrote in message >news:3cd59f1e-eb54-470b-a3b4-3f4745e583db(a)j8g2000yqd.googlegroups.com: > >> >> What I need to do is to delete one of the rows that has item_nbr = 1 and first_name = 'keith'. I don't care which one is deleted. It does NOT matter.<< >> >> >> This is a realllllty bad data management policy. Please care and find >> our why you are getting duplicates in your process. Mop the floor, but >> always fix the leak.
First
|
Prev
|
Pages: 1 2 3 Prev: Organizing Solutions and Projects Next: tuning import of large dataset |