SQL Tip #1 – Find duplicate records

Sometimes we need to detect duplicate rows in a SQL server table. Today I am going to show you different ways to achieve this.

Our sample table has 100001 records, and we will find duplicate records based on the Product name.

 

SQL 1:


SELECT Main.*
FROM Product Main
CROSS APPLY
(
Select TOP 1 ProductId,ProductName
FROM Product
WHERE ProductName = Main.ProductName
AND Main.ProductId > ProductId
ORDER BY ProductId ASC

) AS Sub

 

SQL 2:

SELECT Main.* FROM Product Main
CROSS APPLY
(
Select Min(ProductId) AS ProductId
FROM Product WHERE ProductName = Main.ProductName
) AS Sub
WHERE Main.ProductId > Sub.ProductId

 

SQL 3:


SELECT * FROM
(
Select RANK()
OVER(Partition BY ProductName
ORDER BY ProductId) as Rnk, *
FROM Product
) AS Main
WHERE Main.Rnk >= 2

 

SQL 4:


SELECT
ProductId,ProductName,Qty,CategoryId
FROM (
SELECT
ROW_NUMBER()
OVER (PARTITION BY ProductName
ORDER BY ProductId) AS RowId,*
FROM Product
) AS Wrap
WHERE RowId <> 1

 

SQL 5:


SELECT DISTINCT Main.*
FROM Product Main
CROSS APPLY
(
Select ProductId,ProductName
FROM Product
WHERE ProductName = Main.ProductName
AND Main.ProductId > ProductId

) AS Sub

 

SQL 6:


SELECT
Main.*
FROM (
SELECT ProductName,
MIN(ProductId) AS FirstProductId
FROM Product
GROUP BY ProductName
HAVING COUNT(1) > 1
) AS Sub
INNER JOIN Product Main ON Main.ProductName = Sub.ProductName
WHERE Sub.FirstProductId < Main.ProductId

 

So far I have found that SQL 6 is performing well. However the results could vary in other environments based on the server resources and table indexing.

Please let me know if you have other ideas to achieve this.