登录查看更多内容

Question Answering based on Knowledge Graphs

Andreas Blumauer

SVP Growth and Marketing at Graphwise | Managing Director at Semantic Web Company | Solutions based on Graph & LLM | Responsible AI | Diploma in ESG

发布日期: 2021年4月26日

Why Question-Answering Engines?

The search only for documents is outdated. Users who have already adopted a question-answering (QA) approach with their personal devices, e.g., those powered by Alexa, Google Assistant, Siri, etc., are also appreciating the advantages of using a "search engine" with the same approach in a business context. Doing so allows them to not only search for documents, but also obtain precise answers to specific questions. QA systems respond to questions that someone can ask in natural language. This technology is already widely adopted and now rapidly gaining importance in the business environment, where the most obvious added value of a conversational AI platform is improving the customer experience.

Another key tangible benefit is the increased operational efficiency gained by reducing call center costs and increasing sales transactions. More recently we have seen a strong developing interest in in-house use cases, e.g., for IT service desk and HR functions. What if you didn't have to painstakingly sift through your spreadsheets and documents to extract the relevant facts, but instead could just enter your questions into your trusty search field?

This is optimal from the user's point of view, but transforming business data into knowledge is not trivial. It is a matter of linking and making all the relevant data available in such a way that all employees—not just experts—can quickly find the answers they urgently need within whichever business processes they find themselves.

With the power of knowledge graphs at one’s disposal, enterprise data can be efficiently prepared in such a way that it can be mapped to natural language questions. That might sound like magic, but it's not. It is actually a well-established method to successfully roll out AI applications like QA systems in numerous industries.

Where do Current Question-Answering Methods Fall Short?

The use of semantic knowledge graphs supports a game-changing methodology to construct working QA engines, especially when domain-specific systems are to be built. Current QA technologies are based on intent detection, i.e., the incoming question must be mapped to some predefined intents. A common example of this is an FAQ scenario, where the incoming question is mapped to one of the frequently asked questions. This works well in some cases, but is not well suited to access large, structured datasets. That is because when accessing structured data, it is necessary to recognize domain-specific named entities and relations.

In these situations, intent detection technology requires a lot of training data and struggles to provide satisfactory results. We are exploiting a different technology based on semantic parsing, i.e., the question is broken down into its fundamental components, e.g., entities, relations, classes, etc., to infer a complete interpretation of the question. This interpretation is then used to retrieve the answer from the knowledge graph. What are the advantages?

You do not need special configuration files for your QA engine—everything is encoded within the data itself, i.e., in the knowledge graph. By doing so you automatically increase the quality of your data, with benefits for your organization and for applications using this data.
Contemporary QA engines frequently struggle with multilingual environments because they are typically optimized for a single language. With knowledge graphs in place, the expansion to additional languages can be established with relatively little effort, since concepts and things are processed in their core instead of simple terms and strings.
This technology scales, so it will not make a difference if you have 100 entities or millions of entities in your knowledge graph.
Lastly, you do not need to create a large training data corpus before setting up your engine. The data itself suffices and you can fine-tune the system as you go with little additional training data!

Building QA engines on knowledge graphs: an example from HR

What follows is a step-by-step outline of a methodology using a typical human resources (HR) use case as a running example.

Step 1: Gather your datasets

In this step, business users define the requirements and identify the data sources for the enterprise’s knowledge. After collecting structured, semi-structured and unstructured data in different formats, you will be able to produce a data catalog that will serve as the basis for your enterprise knowledge graph (EKG).

Step 2: Create a semantic model of your data

Here your subject matter experts and business analysts will define the semantic objects and design the semantic schemes of the EKG, which will result in a set of ontologies, taxonomies, and vocabularies that precisely describe your domain.

Step 3: Semantify your data

Create pipelines to automatically extract and semantify your data, i.e., annotate and extract knowledge from your data sources based on the semantic model that describes your domain. This is performed by data engineers who automate the ingestion and normalization of data from structured sources, as well as automate the analysis of unstructured content using NLP tools in order to populate the EKG using the semantic model provided. The resulting enriched EKG continuously improves as new data is added. The result of this step is the initial version of your EKG.

Step 4: Harmonize and interlink your data

After the previous step, your data is represented as things rather than strings. Each object gets a unique URI for links between entities and datasets to be established. This is facilitated by the use of ontologies and vocabularies, which, in addition to mapping rules, allow interlinking to external sources. During this stage, data engineers establish new relations in the EKG using logical inference, graph analysis or link discovery—altogether enriching and further extending the EKG. The result of this process is an extension of your EKG that is eventually stored in a graph database which provides interfaces for accessing and querying the data.

Step 5: Feed the QA system with data

Allowing to ask questions on top of a EKG requires that (a) the data is indexed and (b) ML models are available to understand the questions. Both steps are fully automated in QAnswer. The EKG data is automatically indexed, and pretrained ML models are already provided so that you can start asking questions on top of your data right away.

Step 6: Provide feedback to the QA system

Improving the quality of the answers is done in the following two steps (6 and 7). The business user and a knowledge engineer are responsible for tuning the system together. The business user expresses common user requests and the knowledge engineer checks if the system returns the expected answers. Depending on the outcome, either the EKG is adapted (following Step 2-4) or the system is retrained to learn the corresponding type(s) of questions.

The user can provide feedback to the provided answer either by stating whether it is correct or not or by selecting the right query from a list of suggested SPARQL queries:

Step 7: Train the QA system

New ML models are generated automatically based on the training data provided in step 6. The system adapts to the type of data that has been put into the EKG and the type of questions that are important for your business. The provided feedback improves the ML model in order to increase the accuracy of the QA system and the confidence of the provided answers:

Step 8: Gain immediate insight into your knowledge

With the HR dataset now at your fingertips, you can ask questions like the following: Who are my employees? What languages do my staff speak? Who knows Javascript? Who has experience as Project Leader? Who can program in Java and knows MySQL? Who speaks English and Chinese? Who knows both Java and SPARQL? What is the salary range of my employees? How many people can code in Java and Javascript? What is the average salary of a C++ programmer? Who is the top paid employee?

Looking to the future

In order to have a conversation with your Excel files and the rest of the disparate data that has accumulated over the years, you will need to begin by breaking up the data silos in your organization. While the EKG will help you dismantle the data silos, the Semantic Data Fabric solution allows you to prepare the organization’s data for question answering. This approach combines the advantages of Data Warehouses and Data Lakes and complements them with new components and methodologies based on Semantic Graph Technologies.

A lot of doors will open for your company by combining EKGs and QA technologies, and several domain-specific applications that allow organizations to quickly and intuitively access internal information can also be built on top of our solution.

One of the challenges we address is the difficulty of accessing internal information fast, intuitively and with confidence. People can find and gather useful information as they normally would when asking a human—in natural language. The capabilities of the technology we have presented in this article go well beyond what can be achieved with today’s mainstream voice assistants. This new direction offers organizations a significant opportunity to simplify human-machine interaction and profit from the improved access to the organizations' knowledge while also offering new, innovative and useful services to their customers.

The future of question-answering systems is in leveraging knowledge graphs to make them smarter.

Download our white paper on question-answering based on graphs
Try our live demo!

Abid Ali Fareedi

3 年

Exellent article and no doubt

Mani Vannan

Practice Lead @ AnalyticsWise Inc | Sense Making 360 | Action Learning | Change Making

3 年

Every enterprise can benefit immensely from a Google Assistant that understands it’s industry and custom company Taxanomy and Ontology.

查看更多评论

要查看或添加评论，请登录

查看全部

Question Answering based on Knowledge Graphs

Andreas Blumauer

SVP Growth and Marketing at Graphwise | Managing Director at Semantic Web Company | Solutions based on Graph & LLM | Responsible AI | Diploma in ESG

更多精彩文章

社区洞察

其他会员也浏览了

LLM, GenAI and Database

LLM, GenAI and Database

How to use large models to obtain user data and improve the effect of digital marketing

AI&YOU #40: Retrieval-Augmented Generation (RAG) in Enterprise AI

Building a GEN AI Enterprise Platform via Multiple Data Sources

Are you paying for AI tools? Check out BestFreeAiWebsites.com for a comprehensive list of free alternatives.

Part 3: Implementing RAG – Retrieval-Augmented Generation for Powerful AI Applications

How Large Language Models (LLMs) are going to reshape Businesses.

June 19, 2024

Re-think your Metadata Strategy

2024年7月16日

The Seven Cases for Knowledge Graph Integration in a RAG architecture

2024年6月17日

LLMs have revolutionized AI. Do we still need knowledge models and taxonomies, and why?

2024年2月13日

Why would anyone want a Recommender System if he or she doesn't buy anything at all?

2023年9月25日

What is an Active Metadata Hub?

2022年9月7日

Graph Databases the “Secret Sauce,” and Other Data Management Trends for 2022

2022年1月20日

Why are semantic technologies essential for your content strategy?

2021年9月2日

Why I wrote the Knowledge Graph Cookbook

2021年6月22日

Active Metadata: Be FAIR

2021年2月15日

How User Experience benefits from Knowledge Graphs

2020年10月7日

社区洞察

其他会员也浏览了

LLM, GenAI and Database

LLM, GenAI and Database

How to use large models to obtain user data and improve the effect of digital marketing

AI&YOU #40: Retrieval-Augmented Generation (RAG) in Enterprise AI

Building a GEN AI Enterprise Platform via Multiple Data Sources

Are you paying for AI tools? Check out BestFreeAiWebsites.com for a comprehensive list of free alternatives.

Part 3: Implementing RAG – Retrieval-Augmented Generation for Powerful AI Applications

How Large Language Models (LLMs) are going to reshape Businesses.

June 19, 2024