Suppose a
table has duplicate records and you need to keep only unique records in a table
then follow the following queries.
While i was looking for some query to delete the duplicate records I came across one interesting thing, "%%physloc%%". But as i went on investigating on this I found that it is not recommanded to use "%%physloc%%"(even am not sure about it, to use or not).
To understand at high level,%%physloc%% is a physical location in the data page. It is some thing like ROWID(). Internally SQL Server assigns unique value to each and every record in each and every table which is not seen by the common user when the table is queried.
For more information about %%physloc%% please look into this link :
http://www.codeproject.com/Articles/159785/Physical-location-of-a-row-in-SQL-Server
SELECT *
Consider the
following table:
CREATE TABLE TestTable
(
Column1 varchar(1),
Column2 int
);
INSERT INTO TestTable VALUES ('A', 1);
INSERT INTO TestTable VALUES ('A', 1); -- duplicate
INSERT INTO TestTable VALUES ('A', 2);
INSERT INTO TestTable VALUES ('B', 1);
INSERT INTO TestTable VALUES ('B', 2);
INSERT INTO TestTable VALUES ('B', 2); -- duplicate
INSERT INTO
TestTable VALUES ('C', 2);
SELECT * FROM TestTable
Now the
question comes how to delete the one row among two same rows. Here are the
methods.
Method 1:
SELECT * , ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1) AS RNO
FROM TestTable
SELECT *
FROM (SELECT * , ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1) AS RNO
FROM TestTable)x
WHERE x.RNO=2
DELETE x
FROM (SELECT * , ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1) AS RNO
FROM TestTable)x
WHERE x.RNO=2
SELECT * FROM TestTable
Method 2:
WITH CTE
AS
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1) AS RNO
FROM
TestTable
)
SELECT * FROM CTE WHERE RNO=2
WITH CTE
AS
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1) AS RNO
FROM
TestTable
)
DELETE FROM CTE WHERE RNO=2
Method3:
While i was looking for some query to delete the duplicate records I came across one interesting thing, "%%physloc%%". But as i went on investigating on this I found that it is not recommanded to use "%%physloc%%"(even am not sure about it, to use or not).
To understand at high level,%%physloc%% is a physical location in the data page. It is some thing like ROWID(). Internally SQL Server assigns unique value to each and every record in each and every table which is not seen by the common user when the table is queried.
For more information about %%physloc%% please look into this link :
http://www.codeproject.com/Articles/159785/Physical-location-of-a-row-in-SQL-Server
SELECT b.%%physloc%%,*
FROM
TestTable b
SELECT b.column1, b.Column2,MIN(b.%%physloc%%)
FROM TestTable b
GROUP BY
b.column1, b.Column2
FROM TestTable
WHERE TestTable.%%physloc%%
NOT IN (SELECT MIN(b.%%physloc%%)
FROM TestTable b
GROUP BY b.column1, b.Column2);
DELETE
FROM TestTable
WHERE TestTable.%%physloc%%
NOT IN (SELECT MIN(b.%%physloc%%)
FROM TestTable b
GROUP BY b.column1, b.Column2);
No comments:
Post a Comment