The Old School Method for Removing Duplicates in Excel They Don't Want You to Know
In working with data from any source, you need to make sure you don’t have duplicate values. In many systems, this can be particularly cumbersome. In my favorite manual data processing tool, Excel, there are a few ways to simplify this process.
I’ve used this technique to remove duplicates from Excel for years for its simple nature and easy execution. If you have another way I’d love to hear it in the comments below. Enjoy!
Steps
- Sort data in ascending order based on the key field
- In a blank column adjacent to the key field (eg. OrderID in Col B) enter the following formula in cell A2.
- =IF(COUNTIF($B$1:B1,B2)>0,1,0)
- Make sure when you copy the formula down that the first cell reference in the CountIF() formula is locked and doesn’t change.
- You’ve now identified the duplicates in your data.
- Add a filter to your data, and look for any values of 1 in the column indicating duplicates
- Select all the rows you’ve filtered here (eg. the duplicates) and then delete them
- Revel in your accomplishment
This method works for identifying a simple ID field that is duplicated. If you don’t have a unique ID or you need to create one complete this simple step then follow the previous steps described
Create a Compound Key
- Add a column for the key (typically to the left of the entire data set)
- In that column enter a formula to create the key. In this example, I’m combining the Order Date (Column D) and Customer Name (Column L).
- =D2&L2
- Follow the above steps to add a column to identify the duplicate values and then remove them
- Revel in your accomplishment