SQL Tip #1 – Find duplicate records

Sometimes we need to detect duplicate rows in a SQL server table. Today I am going to show you different ways to achieve this.

Our sample table has 100001 records, and we will find duplicate records based on the Product name.

 

SQL 1:


SELECT Main.*
FROM Product Main
CROSS APPLY
(
Select TOP 1 ProductId,ProductName
FROM Product
WHERE ProductName = Main.ProductName
AND Main.ProductId > ProductId
ORDER BY ProductId ASC

) AS Sub

 

SQL 2:

SELECT Main.* FROM Product Main
CROSS APPLY
(
Select Min(ProductId) AS ProductId
FROM Product WHERE ProductName = Main.ProductName
) AS Sub
WHERE Main.ProductId > Sub.ProductId

 

SQL 3:


SELECT * FROM
(
Select RANK()
OVER(Partition BY ProductName
ORDER BY ProductId) as Rnk, *
FROM Product
) AS Main
WHERE Main.Rnk >= 2

 

SQL 4:


SELECT
ProductId,ProductName,Qty,CategoryId
FROM (
SELECT
ROW_NUMBER()
OVER (PARTITION BY ProductName
ORDER BY ProductId) AS RowId,*
FROM Product
) AS Wrap
WHERE RowId <> 1

 

SQL 5:


SELECT DISTINCT Main.*
FROM Product Main
CROSS APPLY
(
Select ProductId,ProductName
FROM Product
WHERE ProductName = Main.ProductName
AND Main.ProductId > ProductId

) AS Sub

 

SQL 6:


SELECT
Main.*
FROM (
SELECT ProductName,
MIN(ProductId) AS FirstProductId
FROM Product
GROUP BY ProductName
HAVING COUNT(1) > 1
) AS Sub
INNER JOIN Product Main ON Main.ProductName = Sub.ProductName
WHERE Sub.FirstProductId < Main.ProductId

 

So far I have found that SQL 6 is performing well. However the results could vary in other environments based on the server resources and table indexing.

Please let me know if you have other ideas to achieve this.

Advertisements