Friday, September 22, 2006

Delete Duplicates In One SQL Statement

Using a Microsoft T-SQL extension you are able to conduct a self-join on a DELETE statement. The hitch is you cannot use an alias for the first table in the second FROM clause. This works best if there is at least someway to uniquely idenify the table. This will not work with Updateable Views since they do not allow self-joins.

DELETE FROM tblBlah
FROM tblBlah
JOIN tblBlah a
ON tblBlah.ClientName = a.ClientName
AND tblBlah.ClientId > b.ClientId

For a more complete examination see my test code below. Just copy and paste into query anaylzer and give it a run.

SET NOCOUNT ON
-- Create Duplicate temporary table
IF Object_Id('tempdb.dbo.#dups') IS NOT NULL DROP TABLE #dups
CREATE TABLE #dups(
pk int identity(1,1),
Firstname varchar(50),
LastName varchar(50))-- Populate Duplicate table
Insert into #dups (FirstName, LastName)
Values ('Jack','Sparrow')
Insert into #dups (FirstName, LastName)
Values ('Jack','Sparrow')
Insert into #dups (FirstName, LastName)
Values ('Jack','Sparrow')
Insert into #dups (FirstName, LastName)
Values ('Will','Turner')
Insert into #dups (FirstName, LastName)
Values ('Will','Turner')
Insert into #dups (FirstName, LastName)
Values ('Elizabeth','Swann ')
Insert into #dups (FirstName, LastName)
Values ('Elizabeth','Swann ') -- The Table with all duplicates
SELECT * FROM #dups -- Delete duplicates and keep the min primary key record.
-- Reverse the comparison operator to keep the
-- max primary key record.
DELETE FROM #dups
FROM #dups
JOIN #dups b
ON #dups.FirstName = b.FirstName
AND #dups.LastName = b.LastName
AND #dups.pk > b.pk -- The table de-duped
SELECT *FROM #dups


1 comment:

Anonymous said...

Hi Elliott,
Thanks a lot for your simple, quick and easy solution.
I can only add that I used your code for a table without identity PK, the change here is pretty obvious:
1. Create identity PK
ALTER TABLE dbo.XXX
ADD pk INT IDENTITY(1,1)

2. Execute your code.
3. Drop PK
ALTER TABLE dbo.XXX
DROP COLUMN pk

Thanks.
--
Alexey