Deleting Duplicate records in SQL Server [Programming SQL]

Prev: When does a query/subquery return a NULL and when no value at all,
Next: Issues with Select statement doing 'simple' math

From: eliza on 28 Apr 2010 09:24

The query will be composed this way:-

WITH TempUsers (FirstName,LastName, duplicateRecordCount)
AS
(
SELECT FirstName,LastName,
ROW_NUMBER()OVER(PARTITIONBY FirstName, LastName ORDERBY FirstName) AS duplicateRecordCount
FROM dbo.Users
)
DELETE
FROM TempUsers
WHERE duplicateRecordCount > 1
GO

Instead of TempUsers you can give any name. Because this is used only for Temporary purpose.

http://www.mindfiresolutions.com/Deleting-Duplicate-records-in-SQL-Server-522.php

Patrick Flaherty wrote:

delete duplicate rows when they contain an identity column?
14-Mar-08

Hi,

Table with identity columns has accumulated duplicate rows (except that they're
not fully duplicated because the identity column distinguishes them).

Normally I'd do something like:

select distinct * into #elim_dupes from TableA

and then delete all in TableA and INSERT the contents of #elim_dupes back into
TableA.

But how does one eliminate dupes if the rows in TableA have an identity. Which
(as we all know) will, by definition differ for every row.

thanx - pat

Previous Posts In This Thread:

On Friday, March 14, 2008 7:50 PM
Patrick Flaherty wrote:

delete duplicate rows when they contain an identity column?
Hi,

Table with identity columns has accumulated duplicate rows (except that they're
not fully duplicated because the identity column distinguishes them).

Normally I'd do something like:

select distinct * into #elim_dupes from TableA

and then delete all in TableA and INSERT the contents of #elim_dupes back into
TableA.

But how does one eliminate dupes if the rows in TableA have an identity. Which
(as we all know) will, by definition differ for every row.

thanx - pat

On Friday, March 14, 2008 7:58 PM
David Portas wrote:

Re: delete duplicate rows when they contain an identity column?
"Patrick Flaherty" <Patrick_member(a)newsguy.com> wrote in message
news:frf30s0ktn(a)drn.newsguy.com...

"Normally"? Wow! Have you never heard of a UNIQUE constraint? :)

Try this to clean up your data - assuming you want to be left with some
arbitrary row for every instance of col1, col2, col3.

DELETE FROM tbl
WHERE EXISTS
(SELECT *
FROM tbl t
WHERE t.id < tbl.id
AND t.col1 = tbl.col1
AND t.col2 = tbl.col2
AND t.col3 = tbl.col3);

Now add some constraints to your table so that this doesn't happen again. It
amazes me that people add constraints on IDENTITY columns and then don't
bother to add them on other key columns.

--
David Portas

On Saturday, March 15, 2008 12:49 AM
Kalen Delaney wrote:

In addition to David's excellent advice, please note that following is NOT
In addition to David's excellent advice, please note that following is NOT
true:

(as we all know) will, by definition differ for every row. <<

We certainly do not all know that an identity by definition will differ for
every row. That is NOT part of the definition of identity. The only way to
guarantee uniqueness for an identity column is to have a unique index.

--
HTH
Kalen Delaney, SQL Server MVP
www.InsideSQLServer.com
http://DVD.kalendelaney.com

"David Portas" <REMOVE_BEFORE_REPLYING_dportas(a)acm.org> wrote in message
news:r7GdnWKlEqCpjEbanZ2dnUVZ8qKvnZ2d(a)giganews.com...

On Saturday, March 15, 2008 4:21 AM
Henrik Davidsen wrote:

Assuming that col1 is your identity column, and that col2, col3 and col4 are
Assuming that col1 is your identity column, and that col2, col3 and col4 are
the columns that are identical:

WITH duplicates
AS
(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY Col2, Col3, Col4 ORDER BY Col1) AS
RowNumber
FROM YourTable
)

DELETE FROM duplicates
WHERE RowNumber > 1

This CTE (I am not near a SQL Server, so the code is not tested) adds a
rownumber, that for every identical group will contain the numbers 1, 2, ...
, x where x is the numer of identical rows in the first group. The next
group of identical rows will have the rownumber values 1, 2, ..., y, where y
is the number of identical rows in the second group and so on.

Every row that have a rownumber = 1 is a "unique" row, and all rows with
rownumber > 1 is duplicates. You can then just delete from the duplicates
cte, where rownumber > 1. You can start to do a
SELECT * FROM duplicates
to see the effect of the ROW_NUMBER() function.

/Sjang

On Monday, March 17, 2008 2:33 PM
Patrick Flaherty wrote:

Thanx Dave - that worked (the query).
Thanx Dave - that worked (the query).

As for the various moral exhortations and intellectual aspersions, consider
that, _during a development phase_, this might actually be what an (experienced)
programmer wants to do. IDENTITY column but no use of UNIQUE. And Karen, thanx
for the fine print. I recognize your name and know you from SQL Server Mag.

I usually consider, when approaching someone else's problem, that there's a
reason for things. It may be good, it may be bad. But first I try to learn the
reason.

Henrik, yours was the only value-free response, but alas, I'm using SQL Server
2000, not 2005. I'd read and looked at the changes to t-sql in 2005 but hadn't
caught the WITH clause. That's very reminiscent of MDX - in fact, I suppose it
was pulled from there.

I first rolled out a cube, where I worked then and work now, in 2001.

pat

In article <r7GdnWKlEqCpjEbanZ2dnUVZ8qKvnZ2d(a)giganews.com>, David Portas says...

On Monday, April 28, 2008 12:23 AM
Shanthi Prabhu wrote:

Reg Duplicate Rows deletion
Deletion of duplicate rows using row_number or any other way. but it should be a single query

Submitted via EggHeadCafe - Software Developer Portal of Choice
Get Silverlight 4 Installed: Tips and Tricks
http://www.eggheadcafe.com/tutorials/aspnet/05910e41-3846-4db9-8e1b-f54c56a64ed9/get-silverlight-4-install.aspx

|
Pages: 1
Prev: When does a query/subquery return a NULL and when no value at all,
Next: Issues with Select statement doing 'simple' math