When working with large volumes of data, it happens to have duplicates at the tables level in the database.
Therefore it could lead to exceptions in the code and incorrect statistics.
In the following lines we show you how to detect duplicates and how to remove them.
How to manage duplicates problem in SQL Server
Example:
ID LastName FirstName ----------- ---------------- -------- 33 SMITH John 34 SMITH John 35 SMITH John
A) Execute duplicates search query on a given table
SELECT COUNT(*) AS NumberOfDuplicates, field1, field2, field3 FROM [TableName] GROUP BY field1, field2, field3 HAVING COUNT(*) > 1
B) Remove duplicates
We shall proceed by the difference of the ID field values.
In our example we will remove entries with ID 34 and 35.
-- Solution 1 DELETE table FROM table LEFT OUTER JOIN ( SELECT MIN(id) as id, field1, field2, field3 FROM table GROUP BY field1, field2, field3 ) as t1 ON table.id = t1.id WHERE t1.id IS NULL
Or
-- Solution 2 DELETE t1 FROM table AS t1, table AS t2 WHERE t1.id > t2.id AND t1.column1 = t2.column1 AND t1.column2 = t2.column2 AND t1.column3 = t2.column3