From: DavidC on 7 May 2010 12:54 I have a stored procedure that is suppose to find duplicate entries in a table. I am using a subselect and GROUP BY to check for duplicates. The code below is the section of my WHERE clause that is suppose to find duplicates but a few are showing up that are not duplicates. The GROUP BY in the subselect indicates the columns that must be the same for a duplicate. Can anyone see what I am missing that returns fake duplicates? I can post the table schema if helpful but I didn't think it was needed in this case. Thank you. AND (CASE WHEN @OnlyDuplicates = 0 THEN 'T' WHEN @OnlyDuplicates = 1 THEN CASE WHEN dbo.Timesheets.WorkerLinkID IN(SELECT dbo.Timesheets.WorkerLinkID FROM dbo.Timesheets INNER JOIN dbo.PeopleLink ON dbo.Timesheets.WorkerLinkID = dbo.PeopleLink.PeopleLinkID WHERE (dbo.Timesheets.CheckID = 0) GROUP BY dbo.Timesheets.WorkerLinkID, dbo.Timesheets.ClientLinkID, dbo.Timesheets.ServiceID, dbo.Timesheets.Week2Ending, dbo.Timesheets.PayRate, dbo.Timesheets.AuthNumber, dbo.PeopleLink.Branch HAVING (COUNT(*) > 1) AND (dbo.PeopleLink.Branch = @Branch)) THEN 'T' ELSE 'F' END END = 'T') -- David
From: John Bell on 7 May 2010 13:19 On Fri, 7 May 2010 09:54:01 -0700, DavidC <dlchase(a)lifetimeinc.com> wrote: >I have a stored procedure that is suppose to find duplicate entries in a >table. I am using a subselect and GROUP BY to check for duplicates. The >code below is the section of my WHERE clause that is suppose to find >duplicates but a few are showing up that are not duplicates. The GROUP BY in >the subselect indicates the columns that must be the same for a duplicate. >Can anyone see what I am missing that returns fake duplicates? I can post >the table schema if helpful but I didn't think it was needed in this case. >Thank you. > >AND (CASE WHEN @OnlyDuplicates = 0 THEN 'T' > WHEN @OnlyDuplicates = 1 THEN > CASE WHEN dbo.Timesheets.WorkerLinkID IN(SELECT dbo.Timesheets.WorkerLinkID > FROM dbo.Timesheets INNER JOIN > dbo.PeopleLink ON dbo.Timesheets.WorkerLinkID = >dbo.PeopleLink.PeopleLinkID > WHERE (dbo.Timesheets.CheckID = 0) > GROUP BY dbo.Timesheets.WorkerLinkID, > dbo.Timesheets.ClientLinkID, > dbo.Timesheets.ServiceID, > dbo.Timesheets.Week2Ending, > dbo.Timesheets.PayRate, > dbo.Timesheets.AuthNumber, > dbo.PeopleLink.Branch > HAVING (COUNT(*) > 1) > AND (dbo.PeopleLink.Branch = @Branch)) THEN 'T' > ELSE 'F' > END > END = 'T') Hi You will certainly get false posititves for any WorkerLinkID that has any duplicates but not necessarily the combination of WorkerLinkID, ClientLinkID, ServiceID, Week2Ending, PayRate, AuthNumbe and Branch returned from the main query. Try corrolating the Timesheets and PeopleLink columns between the sub-query and the main query and checking the branch in the where clause although that should be in the main where clause anyhow!# John
From: --CELKO-- on 8 May 2010 00:15 Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. If you know how, follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect. Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. If you want to learn how to ask a question on a Newsgroup, look at: http://www.catb.org/~esr/faqs/smart-questions.html I am also trying to figure out what a "link_id" is and why it has two different roles. To me, a link is part of a pointer chain from the old network databases and an "id" is an identifier. I understand "Personnel" but not what kind of entity a "PeopleLink" might be -- a chain gang? You might want to look up "flag coupling" in any book on Software Engineering, since your "@only_duplicates" looks like this kind of design problem.
From: Geoff Schaller on 8 May 2010 01:35 Still acting like a traffic cop... No-one's going to look for the tiny morsel of intelligence you tried to offer at the end. Once they see the diatribe rolling out again it is a simple invitation to stop reading altogether. But keep practising. You might get it right one day.
From: Gert-Jan Strik on 8 May 2010 14:16 In which table are you trying to find duplicates? In Timesheets or in PeopleLink? If PeopleLink.PeopleLinkID is not unique you will almost certainly get duplicates. Also, you might get fake duplicates if Timesheets.WorkerLinkID is not unique. If WorkerLinkID is not unique, then you are not using a key, which will go wrong. If the key is comprised of more than one element, then you might have to rewrite you IN() to an EXISTS(). -- Gert-Jan DavidC wrote: > > I have a stored procedure that is suppose to find duplicate entries in a > table. I am using a subselect and GROUP BY to check for duplicates. The > code below is the section of my WHERE clause that is suppose to find > duplicates but a few are showing up that are not duplicates. The GROUP BY in > the subselect indicates the columns that must be the same for a duplicate. > Can anyone see what I am missing that returns fake duplicates? I can post > the table schema if helpful but I didn't think it was needed in this case. > Thank you. > > AND (CASE WHEN @OnlyDuplicates = 0 THEN 'T' > WHEN @OnlyDuplicates = 1 THEN > CASE WHEN dbo.Timesheets.WorkerLinkID IN(SELECT dbo.Timesheets.WorkerLinkID > FROM dbo.Timesheets INNER JOIN > dbo.PeopleLink ON dbo.Timesheets.WorkerLinkID = > dbo.PeopleLink.PeopleLinkID > WHERE (dbo.Timesheets.CheckID = 0) > GROUP BY dbo.Timesheets.WorkerLinkID, > dbo.Timesheets.ClientLinkID, > dbo.Timesheets.ServiceID, > dbo.Timesheets.Week2Ending, > dbo.Timesheets.PayRate, > dbo.Timesheets.AuthNumber, > dbo.PeopleLink.Branch > HAVING (COUNT(*) > 1) > AND (dbo.PeopleLink.Branch = @Branch)) THEN 'T' > ELSE 'F' > END > END = 'T') > > -- > David
|
Pages: 1 Prev: Nedd help to run this script on all DBs Next: Todays Date in Template Parameter |