The importance of accurate data when building a Digital Twin
Juan Pedro Bretti Mandarano
Engineer data science and energy advisor
These thoughts were shared at Future Downstream conference.
One of the problems of big data is that the total size of the data has the tendency to convince us that the findings based on such large-scale data must be accurate.
However, if the data quality is not considered, this assumption might not be true.
Data quantity is only one criterion for the accuracy of measuring or predicting something. To tackle statistical accuracy based on data and determine how well we can represent the real world, it needs to be assessed alongside data quality.[1]
The quality of data depends strongly on the purpose of their use. Some guidelines should be considered.
I would like to highlight three main ideas:
--1-- Design considering data standardization
Today, when designing a data model, you need to ensure: Data traceability, Data observability, and consider the use of data warehouses.
All the metadata included in your datasets is necessary. These helps Security, Quality, and Transparency.
This standard design should have in mind from the beginning how to integrate legacy systems. Notice I am saying “integrate” and not “migrate”. Migration could be an expensive (in terms of money and time) process. And should not be considered for the very first phase of your project.
--2-- Business value is first
Do not lose track of the business value. Ensure your data programs are not focused on just accuracy. They should have the objective function pointing to business value and solving business problems.
One way to guarantee value is to break silos between disciplines. So, everyone adds their best source of data; and most important knowledge is shared along the different implementations.
And last, an important key of “modelling and analytics” ...
--3-- Analytics is not the new physics
We have been testing our fundamentals for more than 100 years.
We know what works and what does not, that is why we have in our company (and external) experts.
Modelling and analytics are not here to replace the previous, it is to expand and leverage from the previous. What we are doing is a combination of both data driven and numerical models based on physical equations.
Think like “Advance Analytics” as a tool to challenge your experts to rethink the solutions.
Analytics is using big data to support hypothesis, describe behaviors, and ask the specialist for new or never imagined problems.
---- Just one more thing…
Have an Agile mentality with fast MVP implementations and subsequent rollouts.
Aim to demonstrate value fast to the business. Do not charge the first implementations to the business until a case is demonstrated successful.
[3] https://towardsdatascience.com/5-types-of-bias-how-to-eliminate-them-in-your-machine-learning-project-75959af9d3a0
[4] https://www.techtarget.com/searchenterpriseai/feature/6-ways-to-reduce-different-types-of-bias-in-machine-learning