T-SQL/MSBI Knowledge Share Videos

Delete duplicate records

Suppose a table has duplicate records and you need to keep only unique records in a table then follow the following queries.

Consider the following table:

CREATE TABLE TestTable
(
   Column1 varchar(1),
   Column2 int
);

INSERT INTO TestTable VALUES ('A', 1);
INSERT INTO TestTable VALUES ('A', 1); -- duplicate
INSERT INTO TestTable VALUES ('A', 2);
INSERT INTO TestTable VALUES ('B', 1);
INSERT INTO TestTable VALUES ('B', 2);
INSERT INTO TestTable VALUES ('B', 2); -- duplicate
INSERT INTO TestTable VALUES ('C', 2);

SELECT * FROM TestTable 

Now the question comes how to delete the one row among two same rows. Here are the methods.

Method 1:

SELECT * , ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1) AS RNO
FROM TestTable

SELECT *
FROM (SELECT * , ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1) AS RNO
FROM TestTable)x
WHERE x.RNO=2

DELETE x
FROM (SELECT * , ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1) AS RNO
FROM TestTable)x
WHERE x.RNO=
SELECT * FROM TestTable

Method 2:

WITH CTE
AS
(
    SELECT * , ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1) AS RNO
    FROM TestTable
)
SELECT * FROM CTE WHERE RNO=2

WITH CTE
AS
(
    SELECT * , ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1) AS RNO
    FROM TestTable
)
DELETE FROM CTE WHERE RNO=2

Method3:

While i was looking for some query to delete the duplicate records I came across  one interesting thing, "%%physloc%%". But as i went on investigating on this I found that it is not recommanded to use "%%physloc%%"(even am not sure about it, to use or not).

To understand at high level,%%physloc%% is a physical location in the data page. It is some thing like ROWID().  Internally SQL Server  assigns unique value to each and every record in each and every table which is not seen by the common user when the table is queried. 

For more information about %%physloc%% please look into this link :
http://www.codeproject.com/Articles/159785/Physical-location-of-a-row-in-SQL-Server


SELECT b.%%physloc%%,*
FROM   TestTable b

SELECT b.column1, b.Column2,MIN(b.%%physloc%%)
FROM   TestTable b
GROUP BY b.column1, b.Column2
 SELECT *
FROM  TestTable
WHERE TestTable.%%physloc%%
      NOT IN (SELECT MIN(b.%%physloc%%)
              FROM   TestTable b
              GROUP BY b.column1, b.Column2);


DELETE
FROM  TestTable
WHERE TestTable.%%physloc%%
      NOT IN (SELECT MIN(b.%%physloc%%)
              FROM   TestTable b
              GROUP BY b.column1, b.Column2);











No comments: