What is a virtual knowledge graph (OBDA), and what are its applications for business?
Table of content:
- What is a virtual knowledge graph?
- Typical use cases of virtual knowledge graphs
- How does it work?
- The architecture of a virtual knowledge graph
- Tools for creating virtual knowledge graphs
What is a virtual knowledge graph?
Imagine that you have an old workshop rich with all kinds of tools, and you need to solve a particular job, for simplicity: hanging a picture on the wall.
To hang a picture on the wall, you need to know what tools you can use to solve this problem out of those available. And, if you do not know, you are faced with the problem of searching in the boxes for something that makes sense.
But imagine if you had a screen in the workshop to give you all the information needed.
There you can make the following query:
"I want something for attaching a picture on the wall made out of wood that stays there temporarily; I want to be able to remove it without leaving marks; it should be able to hold at least 100 g of weight."
And the system will provide you with the possible solutions for you to choose from. It will tell you the kinds of tools and components that are suitable for the job, which, in our example, include tapes, hammers and nails, as well as glue pads.
Of course, this is idealistic and oversimplified, but this is similar to how virtual knowledge graphs let you ask questions to solve your problems and find a way to answer these questions using data coming from multiple sources.
A virtual knowledge graph (VKG), previously known as ontology-based data access (OBDA), is a knowledge graph that does not exist inside a graph database. Instead, it is generated every time a query is performed, and only the relevant part of the knowledge graph you need is generated.
The data remains in their sources, such as a data lake or relational databases. It does not need to be moved to a graph database.
This is advantageous in cases where:
- You want to avoid copying the data, i.e., migrating it to a graph database
- You want results based on updated data (since the data is retrieved directly from the sources)
This is precisely why a virtual knowledge graph is a powerful approach for exploring data distributed across many heterogeneous data sources.
And it is particularly efficient when you need to integrate huge amounts of data, such as sensor data, or feed a Digital Twin with data.
It lets you get the answers to questions like: “What is the average price range for all hotels near warm lakes in my vicinity.”
or
“give me time series data about water pressure from sensors installed on turbines of the generation 2003 from a certain manufacturer.”
Typical use cases of virtual knowledge graphs
Industry 4.0: A virtual knowledge graph can be used to integrate data from sensors and other sources to create a digital twin of physical entities such as machines. This enables companies to simulate and optimize their processes based on real-time data, reducing the time and cost of production.
Healthcare: A virtual knowledge graph can be used to integrate patient information from different systems, such as electronic health records, medical imaging systems, and clinical decision support systems. This enables doctors to have a more comprehensive view of patient history and make more informed decisions about treatment.
Financial services: A virtual knowledge graph can be used to integrate financial data from various sources, such as transaction data, market data, and customer data. This enables financial analysts to perform complex queries to identify trends and correlations that may not be apparent from individual data sources.
Higher education and research: A virtual knowledge graph can be used to aid the governance of large universities. It enables the possibility to integrate data from different departments of the institution, such as research offices, laboratories, projects, and finance, and combine it with open data. The virtual approach enables the institution's board to have always updated data ingested in their dashboards, which makes decision-making more agile.
How does it work?
Now let’s jump to the more technical side.
Remember the workshop example before?
In the real world, users access the information they need through predefined SQL queries to the data sources.
So why do I need VKGs, you might ask?
If users have new information needs that have not been foreseen, they will need new SQL queries. For complex data scenarios, these queries have to be prepared by skilled engineers and that usually take time, sometimes up to weeks or months, before being ready.
The problem is that the data engineer has to face all the diversity of the data source models and know them and their contexts very well.
Returning to the workshop example, they have to know that a nail only leaves a little trace of wood. This information is not provided on the nail box but is a form of implicit knowledge. They may, however, find in its description data about the maximum weight it is suitable for.
This data has to be reshaped and implicit knowledge must be added to be ready to answer the questions we are interested in.
The virtual knowledge graph represents data in the language of the business. It makes it, therefore, easier to query the data.
The queries are generated by the user in an understandable language based on a common vocabulary (ontology), and the system translates these questions into possibly complex SQL queries that are sent to the data source. Then the user gets the answers in an intelligible form back.
What is the architecture of a VKG?
The core of the VKG consists of three components:
- Data sources - These can be databases, data lakes, data warehouses and more.
- Ontology - This is a universal vocabulary of your organization consisting of classes and properties.
- Mappings - Mappings are rules that describe the relationships between the terms in the ontology and the corresponding data source fields. Typically they are saved in the R2RML language.
As mentioned above, ontologies are important; they are the vocabulary to describe data and should be similar your company's business and technical terms. Ontologies allow you to query the virtual knowledge graph with terms you are familiar with.
Mappings, on the other hand, are fundamental for creating a virtual knowledge graphs. They describe the relationship between data sources and ontologies. They define, for example, if a given company belongs to a specific class in the ontology, such as a local business.
In fact, without mappings, the virtual knowledge graph system can not retrieve data from the sources, as it does not know where to find it. (You can read more on mappings at the following link.)
What environments are there for creating virtual knowledthege graphs?
The VKG approach is getting more and more traction in the industry.
Ontop is one of the most recognized virtual knowledge graph engines. It is an open-source project researched for over 10 years, of which Ontopic is an official supporter.
Ontop translates your SPARQL queries into SQL and retrieves your answers from the underlying data sources. Since the virtual knowledge graph is generated at the moment of querying, it is said to be "always fresh".
It is also possible to insert this knowledge graph into a graph database such as GraphDB; we call this "materialization". The flexibility to decide between a graph database and a virtual knowledge graph, or even a hybrid knowledge graph, also at a later time, is an important strategic advantage of this approach. Start light with a virtual graph and decide later, based on your needs, which parts of the graph to move to a graph database.
Ontopic Studio is an environment compelling for designing knowledge graphs, both virtual and for graph databases. It enables you to map your data with no code and generate the R2RML mapping, which is interoperable with all vendors.
Would you like to test virtualization in your company? Get in touch with us now for a free consultation.
Get a demo access of Ontopic Studio
Ready to do mapping with a no-code approach? Let us help you. Get a demo access: