登录查看更多内容

Understanding the Differences Between Structured and Unstructured Data in Disputes

Phil Beckett

European and Middle East Practice Leader, Disputes & Investigations at Alvarez & Marsal

发布日期: 2019年9月30日

When we think about what type of data would be important in a dispute, we are immediately drawn to the usual suspects: emails, user generated office documents and paper. They are all examples of unstructured data, which makes up between 40 percent and 80 percent of all data in an organisation and on average is growing up to 23 percent annually. Although dispute cases mainly focus on unstructured data, structured data can be of equal importance to help understand an organisation’s corporate environment.

Structured v. unstructured data - what's the difference?

Structured data refers to information stored in fixed fields that is comprised of defined data types such as dates, numbers and text. Characteristics of structured data include: easy to store, query and analyse. Some common examples of structured data include data contained in relational databases and spreadsheets, such as account payable data and payroll.

The content of unstructured data, on the other hand, has no identifiable structure (although some data types do have an associated structure like emails with to, from, cc fields, etc.) and can't be so easily analysed, as it is not organised in a way that allows the straightforward identification of meaningful relationships between data elements.

Why is data important to an organisation?

Data is a metric of every aspect of an organisation and forms the backbone of its business processes. When we start planning for a dispute, we shouldn't limit ourselves to either unstructured or structured data, but rather focus on how different data sources can be used to complement each other.

Unstructured data is easily accessible, easier to understand and it gives us the information we need to build a picture of how an event or series of events occurred. We use it every day - almost all day. Take any given active project on any given day and there will be a plethora of emails, Microsoft Office documents and PDF scans containing project-related information, existing in two main places: emails and a network location. By comparison, structured data requires an understanding of multiple business systems and query language to enable relevant information to be identified and extracted. Consider now the structured data that will be tracked on a typical project and ask your self - what billable hours have been recorded against that project? What interactions have been recorded in the customer relationship management system? What expenses have been entered into the expense and travel system?

When it comes to the running of the business, it's the structured data sets that contain the detailed day to day information we are interested in, which can provide the supporting information required in a dispute. The true value in working with structured data lies in the power of being able to perform complex data analysis over hundreds of millions of transactions in a short period of time. Structured data allows us to build a complete and accurate picture of an organisation from the bottom up, which can then be combined with the top down view that unstructured data gives to provide a holistic view of an organisation. High level figures in reports or financial statements can provide a one-sided view, and by losing details caused by aggregation or changes over time, it might not show the full story. By delving into the underlying transactional level information and performing independent calculations, the exceptions, errors and inconsistencies become clear.

What is the value of structured data in disputes?

There are many types of disputes where structured data can be used to support and test hypotheses. Because structured data feeds into management information systems that allow key stakeholders to make business decisions, as well as enable finance teams to leverage finance systems to calculate year end results, and HR systems to monitor staff turnover and job applications, it can provide valuable insights and answer key questions critical to a dispute.

How can this data be obtained and used?

Now that we know it's important, how do we get it? The approach to working with structured data shares many similarities with the approach to working with unstructured data but there are some key differences.

There are typically five stages when working with structured data:

Scope
Extract
Load
Clean and Transform
Insights and Reporting

Scope

The approach to working with structured data starts off the same as unstructured data. It is about identifying which systems contain the data that may be relevant to the dispute and determining the scope of the requirements. Data mapping helps build a picture of how an organisation’s business systems talk to each other and which individuals have access to what data. Prior to the extraction process, we need to preserve the data. An understanding of the way in which the business' systems store data and the backup processes that are in place is needed in order to successfully preserve data. Sometimes the live data won’t include the time period relevant to the dispute and access to backups will become important, as will ensuring they are not overwritten.

Extract

Just like with the collection of unstructured data, it’s imperative that structured data is collected using a defensible method, as efficiently and effectively as possible. Forensic tools and procedures are utilised to achieve this. As the data is usually large in volume, constantly changing and critical to the business, techniques such as mirroring are implemented to enable efficient and parallel processing of the data.

Load

The extracted data is loaded into an analysis engine, typically a database. As we might be bringing in data from multiple systems in several different formats, it is important to standardise the data and convert it to the same format to make it meaningful. There are several tools that can be used, depending on the type of data, but the key thing to consider is that whatever tool is used, it should ensure that the process is auditable, repeatable and robust.

Clean and transform

The data quality of the extracted data is then assessed, and data cleansing and transformation are performed to ensure consistency and accuracy of the datasets. White spaces are removed, dates are converted to a standard format, relationships between datasets are established and a reconciliation process is performed to ensure the completeness of the data. Once clean and transformed, the data from different sources is consolidated to allow quick and easy access and interrogation of data.

Insights and reporting

Depending on the nature of the dispute and the supporting data, several analytical tools can be utilised to allow users to perform exceptions testing, forecast modelling, valuations, reconciliations, and financial transactions testing, to mention a few. Reports can be automated for repeatability (e.g. weekly/quarterly reports). Various visualisation and reporting tools, like Tableau or Microsoft Excel can allow users to further drill down into output tables, and simple graphs and bespoke dashboards offer a greater capability for investigation with user-friendly dynamic interaction.

Common problems

As one would expect, understanding an organisation’s data environment can be tricky. Common issues you might encounter when working with structured data could include:

Poor quality data

It is important to cleanse and transform ‘dirty data’ to ensure consistency and accuracy of the datasets. This could involve standardisation of key categorical fields, testing for relationships across datasets, exclusion of certain records and reconciliation of detailed record totals against reported values.

Multiple systems in place with different formatted data

Structured data is often located across multiple business systems within an organisation that don’t necessarily talk to each other and don’t allow data to be exported in a consistent manner. To facilitate the analysis and review of this data, it must be exported and then combined into a data warehouse, giving users the ability to connect and query information from different parts of an organisation and gain actionable insights, or the ability to export data in a format that can be loaded into a review platform.

Change to a new system, resulting in differences in data extracts over time

When organisations migrate to a new business system, legacy systems are rarely fully decommissioned and are maintained for reporting to enable access to the legacy data. The issues with dealing with data that resides in both legacy systems and live systems are twofold; firstly, the employees who understand and administer the legacy systems may no longer work at the company, making identification and extraction more difficult, and secondly, the unique identifiers in the databases may have changed between the legacy and new systems. Key fields may no longer be used or renamed, new information may now be available and some data may exist in both systems and be duplicative. It is necessary to perform a deduplication and clean up exercise to link information and validate the data before performing any analysis.

Volume and complexity

It may seem small when compared with unstructured data but in relation to structured data, information is big in volume and complexity. As mentioned above, we can't just extract it and use it - it's not always user-friendly. We need to normalise, cleanse and consolidate it into one system first before we can gain insight into the data. No one wants to look at millions of accounting transactions in a spreadsheet, so it is necessary to provide the reporting and insight to identify anything that is relevant. There is also the issue of redundant data, as with the review of structured data, it is important to remove any duplicative data or anything unnecessary to the scope of the dispute.

Transparency of costs

It's become more straightforward to provide costs for collecting unstructured data. Comparatively, it's more difficult to predict the costs for collecting data from ERP systems where data is spread across tens or hundreds of database tables and we need to understand how the database tables relate to each other and how the system retrieves or provides information from / to other systems. Even after the collection, it will be necessary to recreate the database for the purpose of extracting, normalising, cleaning and presenting the information in a format that can be analysed or reviewed efficiently.

Conclusions

If we were not thinking about structured data before, we should be now. It flows through all business-critical processes from something as complex as an inventory management system or through a series of Microsoft Excel spreadsheets recording the same information. When it comes to disputes, valuable insights can be gained from the data that sits in these business applications. In this context, we refer to insights not driven purely by qualitative information or the analysis of small samples, but, for example, by testing complete populations of transactions and quantification of associated damages, providing concrete data-driven fact-based views of a case. By reviewing 100 percent of all records rather than only considering one aspect of a case, any analysis is more accurate.

We started off this piece by comparing structured data against unstructured data but when we start planning for a dispute, we shouldn't be thinking about them in isolation. They both have their merits in a dispute and they complement each other. It's easy to disregard structured data due to its perceived complexity and the expense to preserve and collect it, especially in an age where better transparency of costs is demanded, (one of the drivers for Practice Direction 51u was perceived excessive cost in disclosure) but ignoring it for these reasons can leave you without the necessary evidence. If you have the right people with the right experience, you should be able to confidently implement an efficient workflow and predict the associated costs.

要查看或添加评论，请登录

Phil Beckett的更多文章

A Year in Review in the world of fraud, investigations and eDiscovery.

2022年1月5日

A Year in Review in the world of fraud, investigations and eDiscovery.

Every year I enjoy taking some time out to reflect and look back on the year that’s just gone and making some…

2 条评论
Disclosure Pilot Scheme: The Inside View (part 2)

2021年6月24日

Disclosure Pilot Scheme: The Inside View (part 2)

You may have already read part 1 of our Disclosure Pilot Scheme (DPS) paper – if you haven’t though just take a look…
Disclosure Pilot Scheme: The Inside View (Part 1)

2021年6月17日

Disclosure Pilot Scheme: The Inside View (Part 1)

We recently published our Disclosure Pilot Scheme (DPS) paper. As part of this work, we surveyed 250 senior lawyers to…

2 条评论
May newsletter

2021年5月10日

May newsletter

It’s hard to believe it’s the second Monday in May already, and that means it’s time for another roundup. Why not take…
A spring newsletter

2021年2月25日

A spring newsletter

Spring has officially sprung. And with those first few glints of sunshine, we also seem to have good news here in the…
A year in review and predictions for 2021; part 2

2020年12月22日

A year in review and predictions for 2021; part 2

Last year I crowdsourced some industry expert’s opinion about their predictions for the new year, and this year I…

1 条评论
A year in review and predictions for 2021; part 1

2020年12月22日

A year in review and predictions for 2021; part 1

Every year in December I enjoy spending some time reflecting on the year that has just been and looking ahead at the…
Tackling Privacy and Data Compliance in Electronic Discovery

2020年11月16日

Tackling Privacy and Data Compliance in Electronic Discovery

We recently read a piece by Afzalah Sarwar, from Morgan Lewis & Bockius LLP all about privacy issues with e-discovery…
A simple guide to e-Discovery in the UK

2020年7月9日

A simple guide to e-Discovery in the UK

e-Discovery, or e-Disclosure as it is also widely known, is an essential phase of not just litigation, but also…

3 条评论
GDPR during a pandemic

2020年6月23日

GDPR during a pandemic

Covid-19 has certainly taken up a large chunk of the news agenda recently, and rightly so. But if you have been…

1 条评论

See all articles

Understanding the Differences Between Structured and Unstructured Data in Disputes

Phil Beckett

European and Middle East Practice Leader, Disputes & Investigations at Alvarez & Marsal

Structured v. unstructured data - what's the difference?

Why is data important to an organisation?

What is the value of structured data in disputes?

How can this data be obtained and used?

Scope

Extract

Load

Clean and transform

Insights and reporting

Common problems

Poor quality data

Multiple systems in place with different formatted data

Change to a new system, resulting in differences in data extracts over time

Volume and complexity

Transparency of costs

Conclusions

Phil Beckett的更多文章

社区洞察

其他会员也浏览了

How to maintain data quality, challenges, and leverage opportunities?

Effective Strategies for Removing Duplicates from a Fact Table in Production

Bad Data is everywhere – and it's costing organisations time, money, and risk exposure.

The 7 Potential Benefits of Having a Data Glossary or Data Catalogue

Logistics Business - Manage your Data NOW or Dig your Graves #4

On to Data Quality

Data Quality Is Non-Negotiable Yet Companies Cut Corners. Why Is This Still Happening?

Top Ways to Avoid Data Conversion Errors

What Data Quality Metrics Should Your Organisation Measure?

How to Build a Golden Record and achieve a Single Source of truth with MAIORA.

Structured v. unstructured data - what's the difference?

Why is data important to an organisation?

What is the value of structured data in disputes?

How can this data be obtained and used?

Scope

Extract

Load

Clean and transform

Insights and reporting

Common problems

Poor quality data

Multiple systems in place with different formatted data

Change to a new system, resulting in differences in data extracts over time

Volume and complexity

Transparency of costs

Conclusions

Phil Beckett的更多文章

A Year in Review in the world of fraud, investigations and eDiscovery.

Disclosure Pilot Scheme: The Inside View (part 2)

Disclosure Pilot Scheme: The Inside View (Part 1)

May newsletter

A spring newsletter

A year in review and predictions for 2021; part 2

A year in review and predictions for 2021; part 1

Tackling Privacy and Data Compliance in Electronic Discovery

A simple guide to e-Discovery in the UK

GDPR during a pandemic

社区洞察

其他会员也浏览了

How to maintain data quality, challenges, and leverage opportunities?

Effective Strategies for Removing Duplicates from a Fact Table in Production

Bad Data is everywhere – and it's costing organisations time, money, and risk exposure.

The 7 Potential Benefits of Having a Data Glossary or Data Catalogue

Logistics Business - Manage your Data NOW or Dig your Graves #4

On to Data Quality

Data Quality Is Non-Negotiable Yet Companies Cut Corners. Why Is This Still Happening?

Top Ways to Avoid Data Conversion Errors

What Data Quality Metrics Should Your Organisation Measure?

How to Build a Golden Record and achieve a Single Source of truth with MAIORA.