What is a Virtual Knowledge Graph (OBDA), and what are its applications for business?
Table of content:?
What is a virtual knowledge graph?
Imagine that you have an old workshop that is rich with all kinds of tools, and you need to solve a particular job for simplicity, hanging a picture on the wall.
In order to hang a picture on the wall, you need to know what tools you can use to solve this problem out of those available. And if you do not know, you are faced with the problem of going searching in the boxes for something that makes sense.
But imagine if you had a screen in the workshop that could give you all the information needed.
There you can make the following query:
I want something for attaching a picture on the wall made out of wood that stays there temporarily; I want to be able to remove it without leaving marks; it should be able to hold at least 100 g of weight.
And then this system will provide you the answer with the possible solutions for you to choose from. It will tell you the kinds of tools and components that are suitable for the job, which include tapes, hammers and nails, as well as glue pads.
Of course, this is idealistic and oversimplified, but this is similar to how virtual knowledge graphs let you ask questions to solve your problems and find the way to answer these questions using data coming from multiple sources.
A Virtual Knowledge Graph (VKG), previously known as ontology-based data access (OBDA), is a Knowledge Graph, but it is virtual because it does not exist inside a graph database. Instead, it is as if it was generated every time a query is performed, and only the relevant part of the Knowledge Graph that you need is generated.
The data remains in their sources, such as a data lake or relational databases. It does not need to be moved to a graph database.
This is advantageous in cases where:
These are exactly the reasons why a Virtual Knowledge Graph is a powerful approach for exploring data scattered across many heterogeneous data sources.
And it is particularly efficient in situations where you need to integrate huge amounts of data, such as sensor data, or for feeding a Digital Twin with data.
It lets you get the answers to questions like: “what is the average price range for all hotels near warm lakes in my vicinity.”?
or
“give me time series data about water pressure from sensors installed on turbines of the generation 2003 from a certain manufacturer.”?
Typical use cases of Virtual Knowledge Graphs include the following:
Industry 4.0: A virtual knowledge graph can be used to integrate data from sensors and other sources to create a digital twin of physical entities such as machines. This enables companies to simulate and optimize their processes based on real-time data, reducing the time and cost of production.
Healthcare: A virtual knowledge graph can be used to integrate patient information from different systems, such as electronic health records, medical imaging systems, and clinical decision support systems. This enables doctors to have a more comprehensive view of patient history and make more informed decisions about treatment.
Financial services: A virtual knowledge graph can be used to integrate financial data from various sources, such as transaction data, market data, and customer data.?This enables financial analysts to perform complex queries to identify trends and correlations that may not be apparent from individual data sources.
Higher education and research: A virtual knowledge graph can be used to aid the governance of large universities. It enables the possibility to have integrated data from different departments of the institution, such as research offices, laboratories, projects, and finance, and combine it with open data. The virtual approach enables the institution's board to have always updated data ingested in their dashboards, which makes decision-making more agile.
领英推荐
How does it work?
Now let’s jump to the more technical side.
Remember the workshop example before?
In the real world, users access the information they need through predefined SQL queries to the data sources, and the users get the answers back.
So why do I need VKGs, you might ask?
If users have new information needs that have not been foreseen, they will need new SQL queries that usually take time (weeks or months) before being ready.?
The problem is that the data engineer has to face all the diversity of the data source models and know them and their contexts very well.?
Returning to the workshop example, they have to know that a nail only leaves a little trace of wood. This information is not provided on the nail box but is a form of implicit knowledge. They may, however, find in its description data about the maximum weight it is suitable for.?
This data has to be reshaped and implicit knowledge has to be added to be ready to answer the kind of questions we are interested in.
The virtual knowledge graph represents data in the language of the business. It makes data easier to query.
The queries are generated by the user in an understandable language based on a common vocabulary (ontology), and the system translates these questions into possibly complex SQL queries that are sent to the data source. Then the user gets the answers in an intelligible form back.
What is the architecture of a VKG?
The core of the VKG consists of three components:
As mentioned above, ontologies are important as they are the basis of definitions for classes and properties. It is basically how you define things in your company. This is useful in order to know exactly what to query on a virtual knowledge graph.
Mappings, on the other hand, are fundamental for the creation of virtual knowledge Graphs. They describe the relationship between data sources and ontology. It defines, for example, if a certain company belongs to a certain class in the ontology, such as a local business.
In fact, without the mappings, the knowledge graph can not retrieve data from the sources, as it does not know where to find it. (You can read more on mappings at the following link.)
What tools are there for creating virtual knowledge graphs?
The VKG approach is getting more and more traction in the industry.
Ontop is one of the most recognized Virtual Knowledge Graph engines. It is an open-source project researched for more than 10 years, of which Ontopic is an official supporter.?
Ontop will translate your SPARQL (query language) queries into SQL and retrieve your answers from the underlying data sources. The virtual knowledge graph will be generated at the Ontop level at the moment of querying.?
It will also allow you to materialize your Knowledge Graph into a graph database such as GraphDB.
Ontopic Studio is an environment especially powerful for designing virtual knowledge graphs with no code.?It enables you to map your data with no code and generate the R2RML mapping, which is interoperable with all vendors.
Would you like to test virtualization in your company? Get in touch with us now for a free consultation.
Knowledge Graph, Semantic Search, & MLAI
1 年You folks should submit a video for the The Knowledge Graph Conference tools track! Here is the link to submit and the deadline is now extended so please consider showing this off to the KGC community https://forms.gle/bpyxA8nXSZfvaNPh6
Author of 'Enterprise Architecture Fundamentals', Founder & Owner of Caminao
1 年Right, which points to ontologies with built-in epistemic modalities. https://caminao.blog/ea-in-bowls/a-knowledge-engineering-framework/