From Fumble to Flourish: Taming Data Duplicates for Peak Productivity!
Nelson Ogudha, People Analytics Leader
Award Winning HR - HR Trendsetter of the Year | HRIS specialist | People Analytics | HRBP |
I recently realized that I overlooked one of my fundamental rules on data cleaning, particularly when dealing with duplicates. Typically, I initiate the data cleaning process by identifying and addressing duplicates, as data subjects often update through double data entry.
Here's the standard procedure I usually follow:
NB: How do you deal with duplicates?
It's worth noting why I don't use the "Remove Duplicates" command under the Data tab. The #dataset I work with has multiple columns, and each column may contain different information, except for the unique data points. Therefore, using the "Remove Duplicates" command would require all fields to be duplicated in the entire dataset.
Despite being at a 90% completion rate, I had to redo the entire process due to this oversight. While I ultimately achieved correctness, I lost valuable time, leading to lower #efficiency. It raises the question: do we all need work instructions to guide us, even when we possess the right knowledge?