What is a Virtual Knowledge Graph (OBDA), and what are its Applications for Business?
Table of content:
- What is a virtual knowledge graph?
- Typical use cases of virtual knowledge graphs
- How does it work?
- The architecture of a virtual knowledge graph
- Tools for creating virtual knowledge graphs
What is a Virtual Knowledge Graph?
Imagine that you have an old workshop full of all kinds of tools, and you need to complete simple task: hanging a picture on the wall.
In order to do so, you need to know which tools you can use out of those available to solve this problem. If you do not know, you are faced with the problem of searching through the boxes for something that makes sense.
Now, imagine if you had a screen in the workshop that provided all the necessary information.
You could make the following query:
"I need something to attach a wooden picture frame to the wall temporarily; I want to be able to remove it without leaving marks; it should be able to hold at least 100 grams."
The system would then provide you with the possible solutions for you to choose from. It would tell you the kinds of tools and components that are suitable for the job. In our example, these would include tapes, hammers and nails, as well as glue pads.
Of course, this is an idealized and oversimplified example, but it is similar to how virtual knowledge graphs allow you to ask questions and find solutiond using data coming from multiple sources.
A virtual knowledge graph (VKG), previously known as ontology-based data access (OBDA), is a knowledge graph that is not stored inside a graph database. Rather, he relevant part of the knowledge graph is generated each time a query is performed.
Data remains in its sources, such as a data lake or relational databases. Therefore, it does not need to be moved to a graph database.
This is advantageous in cases where:
- You want to avoid copying the data, i.e., migrating it to a graph database
- You want results based on updated data, since the data is retrieved directly from the sources
This is precisely why a virtual knowledge graph is a powerful approach for exploring data distributed across many heterogeneous data sources.
It is also particularly efficient when integrating huge amounts of data, such as sensor data, or feed a Digital Twin with data.
It lets you find the answers to questions like: “What is the average price range for all hotels near warm lakes in my vicinity?”
Or:
“Give me time series data about water pressure from sensors installed on generation 2003 turbines from a certain manufacturer.”
Typical Use Cases of Virtual Knowledge Graphs
Industry 4.0: A virtual knowledge graph can be used to integrate data from sensors and other sources to create digital twins of physical entities such as machines. This allows companies to simulate and optimize their processes using real-time data, thereby reducing production time and cost.
Healthcare: Virtual knowledge graphs can be used to integrate patient information from different systems, such as electronic health records, medical imaging systems, and clinical decision support systems. This enables doctors to have a more comprehensive view of patient history and make more informed decisions about treatment.
Financial services: A virtual knowledge graph can be used to integrate financial data from various sources, such as transaction data, market data, and customer data. This enables financial analysts to perform complex queries and identify trends and correlations that may not be apparent from individual data sources.
Higher education and research: A virtual knowledge graph aid in the governance of large universities. It enables the possibility to integrate data from different departments of the institution, such as research offices, laboratories, projects, and finance, and combine it with open data. The virtual approach enables the institution's board to have always updated data ingested in their dashboards, which makes decision-making more agile.
How does it work?
Now let’s jump to the technical side of things.
Do you remember the workshop example from earlier?
In the real world, users access the information they need by using predefined SQL queries on the data sources.
So why do I need VKGs, you might ask?
If users have new information needs that were not anticipated, they will need new SQL queries. FFor complex data scenarios, skilled engineers must prepare these queries, which usually takes time—sometimes weeks or months.
The problem is that the data engineer has to face all the diversity of the data source models and know them and their contexts very well.
Returning to the workshop example, they must know that a nail leaves only a small mark on the wood. This information isn't provided on the box, but it's a form of implicit knowledge. However, they may find data about the maximum weight it can hold in its description.
This data must be reshaped, and implicit knowledge must be added, in order to answer the questions we are interested in.
The virtual knowledge graph represents data in the language of the business. Therefore, it makes querying the data easier.
Users generate queries in a language they understand based on a common vocabulary (ontology). The system then translates these questions into complex SQL queries and sends them to the data source. The user then receives the answers in an intelligible form.
What is the Architecture of a VKG?
The core of a VKG consists of three components:
- Data sources - These can be databases, data lakes, data warehouses, etc.
- Ontology - This is a universal vocabulary for your organization consisting of classes and properties.
- Mappings - These are rules that describe the relationships between the terms in the ontology and the corresponding data source fields. They are typically saved in the R2RML language.
As mentioned above, ontologies are important because they provide the vocabulary with which to describe data. They should resemble your company's business and technical terms. Ontologies allow you to query the virtual knowledge graph using familiar terms.
Mappings, on the other hand, are fundamental for creating a virtual knowledge graphs. They describe the relationship between data sources and ontologies. For example, they define whether a given company belongs to a specific class in the ontology, such as a local business.
Without mappings, the virtual knowledge graph system cannot retrieve data from sources because it does not know where to find it. You can read more about mappings at the following link.)
What Environments are there for Creating Virtual Knowledge Graphs?
The VKG approach is gaining traction in the industry.
Ontop is one of the most recognized virtual knowledge graph engines. It is an open-source project that has been researched for over 10 years, and Ontopic is an official supporter of it.
Ontop translates SPARQL queries into SQL and retrieves your answers from the underlying data sources. Because the VKG is generated when a query is made, it is considered "always fresh".
You can insert this knowledge graph into a graph database such as GraphDB. This is called this "materialization". The ability to choose between a graph database, a virtual knowledge graph, or a hybrid knowledge graph — even at a later time — is a significant strategic advantage of this approach. Start with a virtual graph, then decide which parts to move to a graph database based on your needs.
Ontopic Suite is a compelling environment for:
- designing knowledge graphs with no code, which can be deployed as a virtual graph or from a graph database
- generate R2RML mapping files, which are interoperable with all vendors
- materialize data to be ingested into regular graph databases
- run virtual knowledge graphs queryable as SPARQL
Most surprinsingly, Ontopic Suite can also be used as a semantically enriched virtual SQL database, that can be accessed from many of your preferred tools: Microsoft PowerBI, Pandas, Tableau and many more.
Would you like to test virtualization in your company? Get in touch with us now for a free consultation.
Get a demo access of Ontopic Suite
Ready to create and run virtual knowledge graphs with a no-code approach? Let us help you. Get a demo access: