E-Discovery Tip Sheet: Discovering the Story

E-Discovery Tip Sheet:   Discovering the Story

by Andy Kass

            Some years ago I saw an interview with the late, magnificent Peter O’Toole, conducted by the eminent journalist Charlie Rose.  Rose asked how O’Toole worked out a character, once he learned the lines.

            O’Toole replied in horror, “The text.  The text!  The text is everything.”

            So it is with discovery in litigation.  You may have a theory of the case and affidavits from parties and experts, but to present a credible story to a court you must have documentation to back it up.

            Regardless of the changes in where we keep documents and photos and audio recordings – desks, cabinets, boxes, warehouses, computers, diskettes, tapes, thumb drives, datacenters somewhere on the planet – the imperative to litigation is the same:  to winnow down a set of requested discovery until the thread of a story is revealed.  Today the volumes – and often the stakes – are higher, but the tools have also improved.  (If you’ve ever tried to do a full-text search on a roomful of banker’s boxes, you’ll understand what I’m saying.)

            So let’s see how one goes about pulling a story from a sea of big data.

Blocking

            In football, blocking is a term for moving pesky defenders out of the way so the offense may advance the ball; it’s about preventing someone from getting somewhere you don’t want them to be.  In acting, blocking refers to setting out where the players in a scene will be, and so where the action will occur; here, it is about framing and focusing the presentation. 

            The latter definition most closely parallels how, in e-discovery, we seek to cull a mass of data to something with greater potential for relevance.  This is generally a main point of discussion for the parties’ meet and confer.  Our frame may take into account –

  • Time – the period within which the actions under dispute are alleged to have taken place, starting on a certain date, and ending (if not ongoing) on a certain date.
  • Character – Who has knowledge of the matter and custody and control of possibly relevant information?
  • Space – Where do these people keep the information for the period in question?
  • Props – What types of devices and records may contain potentially probative material?
  • Repetition – Removing and merely referencing duplicates slims the mass of data and streamlines the review.

Authors will generally tell you it is harder to write less, but with a good culling framework, e-discovery tools can help the attorney dispense with a good deal of dross.

 The Text….

            For most of the PC era, text, either recognized from scanned images of pages (OCR) or extracted from document text, has been searched in various ways to identify key terms.  These may be combined or extended in various ways, using Boolean (AND, OR, NOT, XOR) logical operators, wild cards or stemming, or proximity (w/5 or /5 or adj5).  There remain significant limitations to this approach, including but not limited to:

  • Inaccurate or non-existent character recognition;
  • Simple typos;
  • Over-inclusive terms (think “IBM” in a certain antitrust litigation);
  • Common word or name issues (think “Smith” or “Jones” or “from”);
  • Thesaurus issues (g., “airplane” could be jet, plane, aircraft, Boeing, 767, etc.);
  • Code or slang-related issues (such as “SPE” or “special purpose entity” in Enron).

              With the advent of improved analytics tools, the value of text has increased.  Analytics tools apply syntactical and related algorithms to words, their juxtapositions in the text, their frequency and use with other proximate words, to generate and rank clusters of presumptively similar meaning.  Analytics can also pull together near-duplicates, documents containing much of the same content but not identical and not in the same order. 

             Another aspect of such tools is to identify and evaluate email threads, those textual conversations that may go on for days between various parties, adding or removing addressees, attachments, or even subject lines.  While email metadata can pull together a thread, analytics threading can identify unique or most complete messages within the thread and figure out if any messages are missing; this reduces review time required to assess the value of the conversation, and whether any parts of it may have been subject to spoliation.

             Beyond analytics, we enter the world of predictive coding or Technology Assisted Review (TAR), depending on whose blog you’re reading.  TAR takes textual analysis one step further:  it examines how documents with certain textual qualities were coded (mostly Relevant/Not Relevant, but also Privileged or not, potentially even issue/cluster coding) and decides on the next batch content to be offered reviewers.  Generally this would be on the higher relevance side, but would also include documents whose meaning remains ambiguous to the analysis engine so that it can refine its decisions.  Once review of colorably relevant documents within a specified threshold are complete, samples of documents evaluated or excluded as Not Relevant are batched out for quality checking.

 The MacGuffin, and the Denouement

             Having looked at the technical paraphernalia behind the screen, let us re-consider the object of the exercise:  to tell a story.

             The means that we agree on to get there, the tactics employed and terms used, are important, but they are not the story.  If we get hooked on inclusiveness, recall and precision, we may have a mathematically sound set of documents, but without the ability to read into and understand those documents that remain relevant we are no better off than the characters in the Coen Brothers’ movie Hail Caesar! who are focused entirely on a briefcase full of money.  The briefcase is important to the characters, a so-called “MacGuffin”, but the audience is more interested in the story around these characters.

             E-discovery is a tool of litigation, which itself is a process of finding and arguing a theory of the case based upon discernible facts.  E-discovery is about gathering, evaluating, and supplying documentary information as supporting facts for one’s own side, or refuting facts against the other.  What gets splashed on a Trial Director screen at the climax of the process may be the result of a year or more of work.  With the right strategy, the right people, and the right tools, the story will be told.

 -- Andy Kass / [email protected] / 917-512-7503

 

要查看或添加评论,请登录

Andy Kass的更多文章

社区洞察

其他会员也浏览了