eDiscovery AI

eDiscovery AI

For those who know me professionally, it's no secret that document review has always been my nemesis. The passion began in 2008 when I had my first doc review job and was shocked at how inefficient and broken it was. We reviewed hundreds of thousands of junk documents that could have been wiped out with a single keyword search.

When I started my first full-time eDiscovery job in 2010, I quickly found myself involved with projects designed to improve doc review. I figured reviewing documents will always suck, but at least we can use tools that cut down the inefficiencies and wasted time.

Over the next decade I worked on over a thousand projects using predictive coding to make their review more efficient. Big cases, small cases, across 5 continents. Working with private parties, big law firms, or every 3 letter govt agency. We did it all, and probably eliminated review on close to a billion documents.

In 2019, I left eDiscovery to build a tech company. It’s been the most fun I’ve ever had at work. We started building web applications, until a few years ago we completely shifted to focusing all our attention on building AI tools for businesses.

What came next was probably obvious to anyone watching -the first thing my wife said is "I’m surprised it took you this long to figure it out."


The eureka moment came when we were building a new AI tool for a client. We were building a tool that would analyze social media content and classify it based on its context and substance. We were basically building a document review tool.

It hits me... What if we build a predictive coding tool that is built on an AI model?

So that's exactly what we did. After several years of meticulous crafting, we emerged with a predictive coding tool driven by an AI model explicitly designed for document review. A document review robot.

Of course it's thousands of times faster than a human reviewer at a fraction of the cost. But is it good?


Consider this: a computer can trounce a human at chess and poker. So, why should we expect to outperform it when it comes to ANALYZING MASSIVE AMOUNTS OF DATA? Spoiler alert: we can’t.


It's not just good. It is INSANE.

I'm talking upper 90s in recall and precision... ON ISSUE CODES.

Our last project had 5 issues. It got perfect 100% recall and 100% precision on 2 of them and the lowest score was 92% recall and 98% precision... Show me a predictive coding project with over 60% recall/20% precision on issues codes and I'll buy you lunch.

And you know that thing where predictive coding can't find a needle in a haystack because you need a significant number of training examples? That's not a thing anymore. It will identify relevant documents whether it's 1 doc or 1 million.

Did I mention you don't have to train it? That seems relevant.

But then we decided to have some more fun...

What if I told you it could generate an accurate summary for every document in your database? In fairness, it’s probably not necessary to get a summary for every doc, but what if it could identify and generate a summary of every KEY doc in your database? Load your documents into a database, and a few hours later have a nice little document that summarizes every document that is important to your case.

Did I mention it can code key docs with near perfect accuracy as well? It can.

If you aren’t sure why it categorized a document a certain way? Don't worry, it will explain why it made that decision. Sick of scrolling through a large doc trying to figure out why it got marked as key? No need. It will tell you exactly why the doc is key, and it will quote the exact line that made it key if that helps.

You know how I said we are seeing high 90s for recall and precision? I went back and took a look at some of the false positives and false negatives on a couple projects, and after reading the argument for why it made the decision, I agreed the computer was correct and the "expert" was wrong. ON. EVERY. ONE.

By now, you probably aren't going to be surprised when I tell you it can also do priv review. And generate your entire priv log while it's doing it.

Someone told you that AI tools will hallucinate and make up information if it doesn't know how to classify a doc? Not this one.

If it’s not confident, it will classify the document as Needs Further Review and will explain exactly why it is unsure.

What else can it possibly do?

Do you want to know how many times Jeff from accounting and Suzy from HR talked about their fantasy football teams? It can do that.

Do you want to get summaries of important docs for depo prep? It can do that.

Do you need to identify PII and PHI? No problem!

Do you need to redact? ...Ok, so that's one thing it can't do. It can tell you what text needs to be redacted, but it won't auto-redact it for you. I'm adding that to the development roadmap :)


We all knew this day would come, but I figured it was 25 years down the road.

This Predictive Coding tool powered by AI will end document review. It doesn’t eat. It doesn’t sleep. It just perfectly codes documents 24 hours a day.

And I'm going to call it Jim.

Ralph Losey

Attorney, AI Whisperer, Open to work as independent Board member of for-profit corps. Business, Emp. & Lit. experience, all industries. Losey.ai - CEO ** e-DiscoveryTeam.com

1 年

Let's talk again soon. I especially liked the - "I agreed the computer was correct and the "expert" was wrong. ON. EVERY. ONE.'" Reminds me of our Nist test days together. 2015, 2016 seems so long ago now, but we both attained perfect recall on a few trials in 2016, without the help of ChatGPT-4. Still, much easier now, I'm sure. You have a great talent for search. And of course, as I've already said by phone a few weeks ago, count me in too. One suggestion, adjust the tone setting down a bit, so to speak. A modest whisper can be very credible in a room of shouting people.

  • 该图片无替代文字
Leon Major

Director Advisory service & Training | Nebula Senior Brand Ambassador

1 年

Really really interesting, Jim! and PS I love the way you write. :)

Anthony Corrado

I build AR, for real people, with real challenges | Everyday AR

1 年

Kudos! I know your behind the scenes effort. Great work. And this is why, we octopi ??

Patrick Vientos

AI/Data Scientist | AI Agent Architect | eDiscovery Expert

1 年

Interesting. I would challenge the metric scoring performances for such tasks though. However, atomic level understanding will be a major force in the industry as well as semantic AI that can really utilize deep learning inference for many edisco tasks.

Mark Mongiat

Delivering a solution-oriented approach to E-Discovery and case management.

1 年

Way to go!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了