Those Pesky Duplicate Files
Richard Porter-Roth
Consultant - Enterprise Content Management (ECM) Technologies at Porter-Roth Associates
Dear Bud (AKA ECM God),
Why are there so many Duplicate Documents in my document libraries? What can I do about this problem? Please help!
Signed: ECM Mortal
Dear ECM Mortal,
First off, I want to assure you that you are not alone. Duplicate documents (DDs) are like a hidden illness that is hard to detect unless you are assiduously?looking for it -?you begin to notice that your search results seem to be overloaded, you get two or three of the same document in the search, your disk space is filling slowly but “it shouldn’t be that full,” users are complaining about the system being slow (as usual), users complain that they are updating an older version of the document and have to redo the work, etc.
DDs are created for a variety of reasons. Someone may have an important presentation and they create a special file folder and copy the files for the presentation into that folder, but don’t delete it after the presentation….people create “special private” folders for their own quick reference, and often copy existing team files into these folders……teams may have a working sandbox folder and various members copy their original files into the working folder, files are copied to an internal “open dropbox” so that internal or external people can access the file (but the files are rarely removed), and files are emailed to team members who “save” them into their own subfolder of the team folder creating duplicates………etc.
This is actually a deeply disconcerting issue because, short of ruthless oversight (that we know will not happen), people will be people and they are not going to change, they will do their work the fastest and easiest way possible despite “training” and “reminders”…….and that is a fact. Given the DDs will be an ongoing issue that never goes away, let’s try approaching it from a different perspective. This depends on your “document management system” which could a network shared drive, an ECM system like OpenText, a SharePoint system, a cloud-base ECM system like Box.com, or even your shared network file system (O: Drive).
领英推荐
Potential Issues - So, first, forget initial user involvement – most users will not have time to search for duplicates and even given a list of dups, they will not do the work. Second, the duplicate problem, as in the number of files, is (can be) massive – for one library with 10,000 documents, there could be 2000 to 4000 duplicates. Who has time and contextual insight to actually look at the list and make a determination to keep or delete? Third, trying to determine which document to delete (or keep) can be difficult even if the user reviewing the dups is the owner. Fourth, for many companies the “owner” of the files and duplicates has either left the company or changed departments and the “new” person may not be willing to authorize deletion of documents they don’t know about and may feel they are not the “owner” of. Fifth, (and final), users are not going to click on a thousand individual documents and hit delete – the process has to be automated in some manner.
Because of the scope and breath of deleting duplicates, and the potential issues listed above, I suggest that the duplicate removal project be operated out of your primary IT department. The IT department has all the correct access permissions and can, most likely, run the duplicate applications better than the user.
Potential solutions -?I hate to say this but each company, each ECM system, and each de-dup application will be different in how they approach the work to be done. Here are some general guidelines (and feel free to add your own):
Remember that you may be deleting thousands of documents with one keystroke. Check and recheck before hitting “Delete.”?