Delightfully Delving into Deleting Duplicates via CTE

Not too long ago, I started writing about the various things you can do with CTEs (Common Table Expressions). If you missed my past posts on CTEs or need a refresher, have no fear! Just go here:

If you haven’t already guessed yet, this week I’m writing about using CTEs to delete duplicate records. Note: This week’s post is brought to you by “A Real Life Example” dumbfounding database developers and administrators almost everywhere. 

Several weeks ago, an issue arose on one of our production systems where duplicate records needed to be removed from a table. So one of the developers wrote some T-SQL code to fix the issue. He sent it to our team for approval and was completely open to having it rewritten. He’s a nice guy and great to work with so I’m not going to do any bashing. I decided instead this was a GREAT opportunity to educate someone on a different way to code a solution considering this particular system uses SQL Server 2008 R2.

Note: This is a purchased system so we do not have any control over the design whatsoever. 

The problem:  In the Employee table, employees from Department 18 with PayCode A1 have extra records from PayCode Z9.  The extra records where PayCode is Z9 need to be removed for those employees in Department 18 who already have PayCode A1. If there aren’t a lot of records, you could just make note of the primary key and delete those records. In this case, you could actually just make note of the employee IDs that have both pay code values and then put the specific employee IDs in the where clause of the delete statement along with the Z9 PayCode. The code in this post is helpful when you have possibly hundreds of records to delete. So hard coding the employee IDs is not practical when you have that many records. 

Note: There was actually a lot more to the original code including some big update statements, but since this post is about deleting records I kept only the relevant parts. If you want to play along on a test/development system, you can download all of my example code here.
  
Disclaimer:  The syntax you are about to see is real. The data and names of the tables have been changed to protect the innocent.   No developers were harmed in the making of this code.  Warning: The following material may not be suitable for people with heart conditions or those with a strong avserion to backup tables, subqueries, and/or select *. Viewer discretion is advised.

Here is an outline of the original steps taken:

  • Create a backup table containing the unchanged Employee records from Department 18.
  • Create a backup table containing employee records for PayCode A1 with S and M employee types.
  • Create a backup table containing employee records for PayCode Z9 with S and M employee types.   
  • Update A1 records from the backup Z9 table for S type records. 
  • Update A1 records from the backup Z9 table for M type records. 
  • Backup Z9 records for both S and M types.
  • Delete Z9 both S and M type records from table.

Now here the original code. The syntax has not been changed.

--Back up the Employee table.
select * into Employee_Backup
from Employee
where DepartmentID = 18

--Compare the counts between the Employee and Employee_Backup table.
select count(*) from Employee where DepartmentID = 18
select count(*) from Employee_Backup where DepartmentID = 18

--Drop table Employee_BK_A1S if it exists. Backup Employee A1 S/M records
drop table Employee_BK_A1S
select * into Employee_BK_A1S
from Employee
where PayCode = 'A1' and DepartmentID = 18 and EmployeeType = 'S'
-- Drop table Employee_BK_A1M
drop table Employee_BK_A1M
select * into Employee_BK_A1M
from Employee
where PayCode = 'A1' and DepartmentID = 18 and EmployeeType = 'M'

--Backup Employee Z9 S/M records by joining with BK_A1 temp files.
--Not all employees have Z9 records.
drop table Employee_BK_Z9S
select * into Employee_BK_Z9S
from Employee
where Employee.PayCode = 'Z9' and Employee.DepartmentID = 18
and Employee.EmployeeID =
 (select Employee_BK_A1S.EmployeeID
  from Employee_BK_A1S
  where Employee.EmployeeID = Employee_BK_A1S.EmployeeID
  and Employee.EmployeeType = Employee_BK_A1S.EmployeeType)
drop table Employee_BK_Z9M
select * into Employee_BK_Z9M
from Employee
where Employee.PayCode = 'Z9' and Employee.DepartmentID = 18
and Employee.EmployeeID =
 (select Employee_BK_A1M.EmployeeID
  from Employee_BK_A1M
  where Employee.EmployeeID = Employee_BK_A1M.EmployeeID
  and Employee.EmployeeType = Employee_BK_A1M.EmployeeType)

--Check the number of records in both S and M tables.
select count(*) from Employee_BK_Z9S
select count(*) from Employee_BK_A1S
select count(*) from Employee_BK_Z9M
select count(*) from Employee_BK_A1M

--If the counts are different, let the user know.
select * from Employee_BK_A1M
where not exists
(select *
 from Employee_BK_Z9M
 where Employee_BK_A1M.EmployeeID = Employee_BK_Z9M.EmployeeID
 and   Employee_BK_A1M.EmployeeType = Employee_BK_Z9M.EmployeeType)
--Drop these tables if they exist.
drop table Employee_bk_S_deleted
drop table Employee_bk_M_deleted

--backup Employee Z9 records from Employee table.
select * into Employee_bk_S_deleted
from Employee
where Employee.PayCode = 'Z9'
and Employee.DepartmentID = 18
and  Employee.EmployeeType = 'S'
and Employee.EmployeeID =
(select Employee_bk_Z9S.EmployeeID
 from Employee_bk_Z9S
 where Employee.EmployeeID = Employee_bk_Z9S.EmployeeID
  and  Employee.EmployeeType = Employee_bk_Z9S.EmployeeType)

select * into Employee_bk_M_deleted
from Employee
where Employee.PayCode = 'Z9'
and Employee.DepartmentID = 18
and  Employee.EmployeeType = 'M'
and Employee.EmployeeID =
(select Employee_bk_Z9M.EmployeeID
 from Employee_bk_Z9M
 where Employee.EmployeeID = Employee_bk_Z9M.EmployeeID
  and  Employee.EmployeeType = Employee_bk_Z9M.EmployeeType)

--Delete the Z9 records for both S and M types from Employee table.
delete from Employee
where Employee.PayCode = 'Z9'
and Employee.DepartmentID = 18
and  Employee.EmployeeType = 'S'
and Employee.EmployeeID =
(select Employee_bk_Z9S.EmployeeID
 from Employee_bk_Z9S
 where Employee.EmployeeID = Employee_bk_Z9S.EmployeeID
  and  Employee.EmployeeType = Employee_bk_Z9S.EmployeeType)

delete from Employee
where Employee.PayCode = 'Z9'
and Employee.DepartmentID = 18
and  Employee.EmployeeType = 'M'
and Employee.EmployeeID =
(select Employee_bk_Z9M.EmployeeID
 from Employee_bk_Z9M
 where Employee.EmployeeID = Employee_bk_Z9M.EmployeeID
  and  Employee.EmployeeType = Employee_bk_Z9M.EmployeeType)

select * from Employee

Here is a look at the table records both before the code above is executed and the results from executing this code…

Before: 

Example #1 - Employee Records Prior to Deletion

and after…

Example #1 - Employee Records After the Deletion via Original Code

If you run the code all at once you will get some errors that you can’t drop the backup tables since they do not exist. There is no “if exists” to check for these tables first. Not a big deal, really, but it could be a tad bothersome. Personally, I’d rather check for the existance of the backup tables first.

Yes, that code is quite lengthy, and there’s quite a bit that can be changed. So here is what can be done differently with the same results:

  • The backup tables can be eliminated all together.
  • The EmployeeType for values ‘M’ and ‘S’ can be combined into the same where clause using the “or” operator.
  • The subquery can be replaced using a CTE.
  • Unless you’re being extra cautious, there’s really no need to check the count every step of the way. You can add in the BEGIN TRAN and ROLLBACK to check the results before you commit to the database.

So here is my version:

begin tran
 select EmployeeID
   ,EmployeeType
   ,DepartmentID
   ,PayCode 
   ,RegularPayAmount
 from Employee
 
 ;with EmployeePayCode_A1 (EmployeeID, EmployeeType, DepartmentID)
 as
 (
  select EmployeeID, EmployeeType, DepartmentID
  from Employee
  where PayCode = 'A1'
    and DepartmentID = 18
   and (EmployeeType = 'S' or EmployeeType = 'M')
 )

 --Delete the Z9 records for both S and M types from Employee table
 -- but only delete those who have corresponding A1 S/M records.
 delete Employee
 from Employee
 inner join EmployeePayCode_A1
 on (Employee.EmployeeID = EmployeePayCode_A1.EmployeeID
 and Employee.EmployeeType = EmployeePayCode_A1.EmployeeType
 and Employee.DepartmentID = EmployeePayCode_A1.DepartmentID)
 where Employee.PayCode = 'Z9'  

 select EmployeeID
   ,EmployeeType
   ,DepartmentID
   ,PayCode 
   ,RegularPayAmount
 from Employee
rollback

As you can see, there are quite a few differences. The CTE just retrieves employee records where the PayCode is A1, the DepartmentID is 18, and the EmployeeType is S or M. Then I joined the CTE to the Employee table on the relevant fields and where PayCode is Z9. When it’s executed, the results are the same as the original code. There’s just less code to maintain. Feel free to play with this code all you want. As with pretty much everything, there’s always more than one way to accomplish the same task. This is just one of the ways you can delete duplicates. 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s