Innovations in E-discovery & Digital Forensics - Active Learning in Visual Analytics
Dr Mithileysh Sathiyanarayanan
Young Scientist | CEO | Innovator | Entrepreneur | Angel Investor
Digital communication has changed human life since the invention of the internet. The growth of E-mail, social websites and other interpersonal communication systems in turn have brought rapid development in especially the key technological area of data analytics. Using advanced forms of analytics helps the examination of data and better informs investigative sense-making and decision-making of all kinds. The legal process called Electronic discovery (E-discovery) is used for investigating various events in the digital communication world, for the purpose of producing/obtaining evidence (such as evidence in the form of emails used in the Enron fraud case). Investigating digital communications collected over a period of time, manually, is a strenuous process, time consuming, expensive and not very effective. More recently, within E-discovery there has been development of analytics known in the legal community as “Technology assisted review” (TAR). TAR is a technology-driven assistant in E-discovery for identifying relevance in the documents/data which saves time and improves efficiency in investigation. At the same time, the efficacy of visualisation tools currently available in the market is increasing, where such tools depend on a combination of simple keyword searches and more complex representations (e.g. network graphs). Also inE-discovery, early case assessment is a process of estimating risk (cost and time) to prosecute or defend a legal case based on an early review of potentially relevant electronically stored information (ESI). Legal firms largely determine the duration of the E-discovery process and charge companies based on the volume of information collected and reviewed after an automated search, where ESI may then be manually reviewed intensely to determine relevance and privilege. This results in significant costs for the company or in a number of cases settlement because a party cannot afford to continue with the lawsuit due to E-discovery costs.
Over the last few years, E-discovery & Digital Forensics have seen interesting innovations, such as machine learning, artificial intelligence etc. The ongoing transition online and generating communications, materials, etc. via our laptops, phones, and other systems has placed an exponential amount of stress on the E-discovery & Digital Forensics infrastructure. Our goal is to develop solutions that actually help the investigators and others along the value chain - from service providers to attorneys, to private and public sectors.
Importance of Visualisation in E-discovery & Digital Forensics
With the exponential growth of data, it is no longer feasible to manually review every single document. Merely showing the metadata and basic keyword search results is not going to help attorneys (end users) grasp the meaning and see patterns in the data. Proper visualisation of data is vital for organisations to build effective strategies when managing each case. With the sheer amount of data, it is extremely difficult to sort through the data sets visually and make sense of the trends or patterns. So, better techniques are needed to improve the analysis.
With ECA (Early Case Assessment), end users can get visibility on the potential risk and potential cost of moving forward with the litigation. An useful visualisation tool will allow clients to make informed decisions throughout the case and as soon as possible. With such information, users can make better decisions on whether to continue or settle the case. Understanding the data means being able to interpret it more effectively. The visualisation can allow users to identify what reviewers (or other resources) are needed when and at what cost. Visualisation tools must be able to scan the data and provide reports to see the type of data, the volume of data, and, based on initial assessments, decide how to move forward.
Visualisation tools must provide reports in every stage, from as soon as scanning of the data is complete to once data is culled and ready to review. With these, users can decide to move forward or drop in any of those points. The information gathered is critical so that planning and investments can be thoughtful and well-informed. If, at this point, the data is worth processing, then at least clients will have this clarity early in the discovery process.
How Active Learning can play a key role in E-discovery & Digital Forensics?
Continuous Active Learning (CAL) in Visual Analytics (VA) is a special type of incremental supervised machine learning (ML) where the users are involved (human-in-the-loop) in the learning process to guide the training & analysis. In the CAL, user will constantly query, annotate/label data and improve the quality of the learning model. Active Learning is useful in cases where large portions of the data are to be analysed and are unlabelled, where manual labelling is expensive, and in cases of live monitoring of streaming data (email or social media data), where new unlabelled data needs to be processed continuously.
CAL with visual analytics is something we have been working on. When users leverage assisted review, it includes training the system, which requires more technical resources and know-how and being able to figure out if results are good or not. With CAL, the system learns from users what they are looking for and then presents the documents which the user may want to see.
We believe that it is critical to work with the end users in mind and then work back to what technology is needed. The more manual processes we can convert to automated solutions the better. Discovery inherently has a lot of stops and starts and it is great to see CAL and so many other evolutions come into play to reduce downtime, costs, and create value for organisations.
You can read my article on Challenges and Opportunities in using Analytics Combined with Visualisation Techniques for Finding Anomalies in Digital Communications
More info to follow soon with our solutions…………
Driving Competitive Advantage with AI-Driven eDiscovery | Enterprise Account Executive | AI Trailblazer in Legal Technology
6 年Great overview Mithileysh. I will definitely share this within my LinkedIn network. Ipro Tech is an industry leader in the eDiscovery field. Our concept clustering wheel is a great demonstration of the visual analytics and technology assisted review. I would love to share some information with you to help in? your research. Please send me a private message. Thanks again!