Author: Akash Heranjal
Description of the problem:
A database table by name MyTable has 3 data fields MyID, MyName, MyAddress.
• MyID is Identity and primary key (PK) with Int data type
• MyName is a VarChar(200) data type
• MyAddress is TEXT data type
Accidently, the PK and Identity is removed and there are duplicate
records in the table. Now we need to introduce the PK back on MyID by
cleaning the database table. There are 10,000,000 records in the table
and we really do not know which record has got duplicated and how many
times. The focus is to eliminate duplicate records from the database
table and introduce the PK back on MyID field.
Solutions/Approach:
• Introduce an additional column by name RecID and make it Identity column
• Run the below query
Delete from MyTable Where RecID Not In
(Select Min(RecID) from MyTable Group By MyID)
General Syntax:
Delete from [TableName] Where [NewIdentityCol] Not In
(Select Min([NewIdentityCol]) from [TableName] Group By [ColNameHavingDuplicate])
• Remove the new column RecID
• Introduce Identity and PK back on MyID field