Those Pesky Duplicate Files

Those Pesky Duplicate Files

Dear Bud (AKA ECM God),

Why are there so many Duplicate Documents in my document libraries? What can I do about this problem? Please help!

Signed: ECM Mortal

Dear ECM Mortal,

First off, I want to assure you that you are not alone. Duplicate documents (DDs) are like a hidden illness that is hard to detect unless you are assiduously?looking for it -?you begin to notice that your search results seem to be overloaded, you get two or three of the same document in the search, your disk space is filling slowly but “it shouldn’t be that full,” users are complaining about the system being slow (as usual), users complain that they are updating an older version of the document and have to redo the work, etc.

DDs are created for a variety of reasons. Someone may have an important presentation and they create a special file folder and copy the files for the presentation into that folder, but don’t delete it after the presentation….people create “special private” folders for their own quick reference, and often copy existing team files into these folders……teams may have a working sandbox folder and various members copy their original files into the working folder, files are copied to an internal “open dropbox” so that internal or external people can access the file (but the files are rarely removed), and files are emailed to team members who “save” them into their own subfolder of the team folder creating duplicates………etc.

This is actually a deeply disconcerting issue because, short of ruthless oversight (that we know will not happen), people will be people and they are not going to change, they will do their work the fastest and easiest way possible despite “training” and “reminders”…….and that is a fact. Given the DDs will be an ongoing issue that never goes away, let’s try approaching it from a different perspective. This depends on your “document management system” which could a network shared drive, an ECM system like OpenText, a SharePoint system, a cloud-base ECM system like Box.com, or even your shared network file system (O: Drive).

Potential Issues - So, first, forget initial user involvement – most users will not have time to search for duplicates and even given a list of dups, they will not do the work. Second, the duplicate problem, as in the number of files, is (can be) massive – for one library with 10,000 documents, there could be 2000 to 4000 duplicates. Who has time and contextual insight to actually look at the list and make a determination to keep or delete? Third, trying to determine which document to delete (or keep) can be difficult even if the user reviewing the dups is the owner. Fourth, for many companies the “owner” of the files and duplicates has either left the company or changed departments and the “new” person may not be willing to authorize deletion of documents they don’t know about and may feel they are not the “owner” of. Fifth, (and final), users are not going to click on a thousand individual documents and hit delete – the process has to be automated in some manner.

Because of the scope and breath of deleting duplicates, and the potential issues listed above, I suggest that the duplicate removal project be operated out of your primary IT department. The IT department has all the correct access permissions and can, most likely, run the duplicate applications better than the user.

Potential solutions -?I hate to say this but each company, each ECM system, and each de-dup application will be different in how they approach the work to be done. Here are some general guidelines (and feel free to add your own):

  1. Always review this with your Legal Department regarding Preservation Orders (POs). It is important to identify all POs in all the different systems you may have and to ensure that documents under a PO are protected and that the user knows if any of his/her documents are under a PO, what a PO is, and what it does. “Original” documents under a PO cannot be deleted.
  2. Always review, if possible, if Business Records have been declared and ensure that these business records are identified and not deleted.
  3. Research and pick a software application that identifies duplicate documents. Again, this will be dependent on your system. What works for SharePoint may not work for OpenText. If you have an ECM system, it may be able to identify duplicate documents.
  4. Ensure the chosen system can handle shared network folders, ECM systems, and Cloud storage and be able collect all documents from the diverse sites into one list. I.E., people may store a “temporary” document in a cloud storage that is a duplicate of a file in the on-premise system. And of course, they don't delete the "temporary" document.
  5. After choosing an application, train your IT department or users on the application. Again, you may have to have admin privileges to run the program and delete documents.
  6. Ensure that, based on the application you choose, that you understand and agree which files can be deleted en masse. This typically is an application dependent choice such as:

  • Delete oldest file and keep newest file and vice versa
  • Delete first (or last) file in the list
  • Custom search, for example by date or file type
  • Setting that allows you to “move” (not copy) selected files to an identified location and retain them until reviewed and deleted
  • Can be other deletion choices depending on the application

Remember that you may be deleting thousands of documents with one keystroke. Check and recheck before hitting “Delete.”?

要查看或添加评论,请登录

Richard Porter-Roth的更多文章

  • Commonsense thinking about records management

    Commonsense thinking about records management

    This post is for both the “management team” and the “worker team” and has some commonsense thoughts about records…

  • Why Migrate and Close Your Shared Network Folders/Files? Part 1

    Why Migrate and Close Your Shared Network Folders/Files? Part 1

    As companies begin to be more cloud-based with cloud file sharing and collaboration applications like Box, Dropbox…

  • ECM Systems - Post-implementation Support Needs

    ECM Systems - Post-implementation Support Needs

    Planning for a document management system should include post-implementation support. Document management systems are…

    1 条评论
  • Got Engineering Drawings?

    Got Engineering Drawings?

    In many companies there are current and archival files of engineering and architectural drawings, maps, and schematics…

    1 条评论
  • Implementing Records Management - 10 Considerations

    Implementing Records Management - 10 Considerations

    1. You will need a document management (DM) system that includes records management capabilities.

  • Document Management is Alive and Kicking?

    Document Management is Alive and Kicking?

    Notice that I said “Document Management” and not ECM or “Content Services….” Really, I think this naming and renaming…

    2 条评论
  • Data Lake?

    Data Lake?

    How do you go fishing in a Data Lake? Use metadata as bait..

    1 条评论
  • SharePoint Migration Planning - Site Clean UP

    SharePoint Migration Planning - Site Clean UP

    Here are 10 tips for doing a SharePoint OnPremise site cleanup prior to SharePoint Online migration starting. When…

  • Robotic Process Automation

    Robotic Process Automation

    Very interesting article in yesterday's New York Times (8/6/2018) about robotic process automation, which sounds like…

  • How to Write Your Next Rejected Sales Proposal

    How to Write Your Next Rejected Sales Proposal

    I am currently going through a review of proposals that responded to an ECM (enterprise content management) RFP that I…

    3 条评论

社区洞察

其他会员也浏览了