Data Interactions, FAIR Data and Digital Twins

Data Interactions, FAIR Data and Digital Twins

Or - "The if statement that changed the world"

In plays and films, books and music there is often a key “moment” where everything in the story comes together. Think of the “I am your father” moment from Star Wars or the change from major to minor, Odette/Odile, moment in Tchaikovsky’s Swan Lake. Software and data engineers know these moments, too: the moment when, after days of work, you get everything together in your code and data so you can, *finally*, write and run “The if statement that changed the world”?.

A version of the “if” statement might be

if river.level > X and rainfall.forecast > Y

I’m sure you can write the “then” part of this “if” yourself, and it’s likely to involve millions of pounds of damage, weeks of transport disruption and possible loss of life.

This “if” statement is our first kind of data interaction. A computer algorithm (important emphasis) has brought two pieces of data together so they can be compared and some insight gleaned. The story of what those pieces of data are and how they get to the “if” statement is more complex than you might think.

The first chapter of the story of how the river levels and the rainfall get together is about finding data. There’s no search engine for data: not publicly and rarely within enterprises. There are attempts at searchability such as data.gov.uk but they are not intended for algorithms, rather they are for people. It’s said that data scientists spend at least 50% of their time looking for data: not looking at data, looking for it. This epic waste of time is because data is hidden away, deliberately or unintentionally, in silos, in datasets, behind APIs or in program-unfriendly formats, such as PDF. This is definitely not findable by machines, but what if it was?

The next chapter of the story is about access and interoperability. The two are linked. Our imaginary computer might be able to find some data but it might not be able to understand it. It would definitely be nice for interoperability purposes if it could have some metadata to indicate that the river level was measured in metres and the rainfall in millimetres, but what if it did? We now have Find, Access and Interoperate and the data interaction in the “if” statement is Re-using that data for our new purpose. There’s more to the story of FAIR data, but that’s in another post.

If we dare to imagine the “What if”: a world where data is FAIR, and our computer can find this data and understand it, we still can’t program it with that “if” statement when the data is in large datasets. What our algorithm also needs is the river level at this location and the rainfall forecast at a different location, probably well upstream from the place where the flood is likely to occur. So, even if our algorithm can find the right dataset, it still needs to know how to run a query against the dataset to find the data is wants. There’s an element of granularity of the data that is important - and that’s where the digital twins come in.?

Digital twins are a virtualisation of an asset’s data. The asset is a useful level of granularity here. Our algorithm needs to be able to choose these rainfall forecasts, but not all of them and those river levels but not some others. Metadata about the assets beyond their location might also be useful. Knowing who operated them would help our algorithm to assign weight to the readings if some operators’ data proved more reliable and accurate. Having some provable provenance of the data as actually coming from that twin and the twin really being the one operated by the Environment Agency, for example, would build trust in the output of our imagined algorithm. The exchange of metadata between twins to establish trust and access is our second data interaction.

The final chapter of the data’s story to get to the “if” statement is about timeliness. Homeowners won’t appreciate being told on Wednesday that a flood would occur on Tuesday when their houses are already knee-deep in muddy water. The data needs to flow between the twins and the algorithm as close to real time as possible so that the predictions are available in a timely way. This is not just important in our imaginary flooding scenario; it’s important in business, where latency between something happening and the business reacting to it can cost millions.

A couple of questions to answer before we conclude. We imagined an algorithm running, exchanging data with digital twins. First question: What does the algorithm do in the “then” part of the “if”? Change a dashboard? Update a database? Send an email? What if it could share the data back with other digital twins, or create new twins of the likely flood locations and have them share into a growing ecosystem of cooperative twins? That would be more FAIR. Second question: In what context does the algorithm run? I don’t think the answer will surprise you... In its own digital twin. Doing it this way simplifies the model (everything is a twin) and creates a nice symmetry in the problem. The twin of the algorithm interacts with the twins of the data sources. Data interactions? Data interactions are twin interactions. Twin interactions are the exchange of data and metadata between twins.

There are so many key words in this post. I’m going to concentrate on two: “interact” and “if”. Interact implies “between” - that there are at least two parties involved. A consumer interacts with a producer; a supplier with a customer. Both have agency in the interaction. A customer doesn’t have to accept the supplier’s goods or service. The producer can cut off a consumer if they don’t like their behaviour. The second word is “if”. We started with a programmatic sense of the word as “if - then”. We moved on to an imaginary world of “what - if” and finally I’ll leave you with an “if - only”. “If only” data and twins could cooperate. What transformation could we achieve then?

Andrew Padilla

Helping organizations successfully navigate their information technology initiatives

3 年

I find that one of the contextual elements missing from datasets and the data constructs that compose them are the behavior (i.e. algorithms) that operate on them. We treat these distinctly separate when we publish datasets and so 'interacting' with the data and it's relationships is largely void of any behavior unless you provide it from scratch yourself. When you do, its not easy for anyone else to leverage it in the future. Making data constructs, their relationships, and available behavior FAIR would be ideal

回复
Andrew Padilla

Helping organizations successfully navigate their information technology initiatives

3 年
回复

Really good article. Finding data is challenging especially if you need it to be FAIR. FYI Google's Dataset Search is a search engine for data sets. Europe also has an initiative called National Access Points covering mobility data which might be of use.

回复

要查看或添加评论,请登录

Mark Wharton的更多文章

  • Semantics and Decentralisation. Can you have one without the other?

    Semantics and Decentralisation. Can you have one without the other?

    Whilst writing another article about loosely coupled systems and their data, I was struck by one internal reviewer’s…

    12 条评论
  • Back where we left off

    Back where we left off

    Alan Morrison's recent post on the five Commingled Phases of Compute, Networking and Storage spurred me on to finish…

    6 条评论
  • How OT became separated from IT

    How OT became separated from IT

    Or..

    4 条评论
  • APIs are like waiters 33 1/3

    APIs are like waiters 33 1/3

    We give up This is the third and final part of the “APIs are like Waiters” trilogy. In the previous two parts we’ve…

    6 条评论
  • APIs are like waiters - part deux

    APIs are like waiters - part deux

    Let’s start with a brief recap - or read here and jump to paragraph 2. Using the analogy of ordering fish and chips, I…

    4 条评论
  • APIs are like waiters - or are they?

    APIs are like waiters - or are they?

    Application Programming Interfaces - APIs - are everywhere and take many forms and styles. Wikipedia defines them as “.

    20 条评论
  • Drowning in data while the floodwaters rise

    Drowning in data while the floodwaters rise

    All forward steps in human technology have always met with some resistance. The Luddites were against the mechanisation…

    5 条评论
  • Why I hate APIs

    Why I hate APIs

    APIs are not for Application Programs, they are for Application Developers In a previous post, I put forward the case…

    10 条评论
  • Why cheats may prosper

    Why cheats may prosper

    Innovative companies employ counter-intuitive ways to encourage creative and inventive behaviours As an inventor…

    6 条评论
  • Digital twins in the open

    Digital twins in the open

    How ecosystems of digital twins solve problems that closed systems cannot. “No man is an island, entire of itself;…

    12 条评论

社区洞察

其他会员也浏览了