A Semantic Retrieval System for Web Platforms
Let us imagine that you are the manager of a platform of services on the Web. As an ambitious manager, you logically want to enhance your business and augment the number of your customers. For this reason, how are you sure that customers get the relevant services for their needs when they visit your website? How to be sure, considering that they do not know anything about your field of expertise, that you understand each other and consequently sell them the service they are searching for?
The problem is customers do not have a sufficient knowledge of what you are able to achieve. After all: “you are the expert”. On the other hand, a good manager needs to improve its impact on the market balancing with the investments. It is important to understand the potential users’ needs to better present your service and make the sale. The problematic relies in the fat that a semantic gap might occur your communication and particularly if your advertising is done through a static web page.
One of our researches proposes to reduce the semantic gap between the customers and the services providers on a web market platform. The study aimed to create an algorithm that understands the users queries and respond to it with an expert answer in an automatic way. This was especially done using Natural Language Processing (NLP) and ontologies. While the former represents ways to program computers to process and analyze large amounts of natural language data, the later structures knowledge within high-defined graph-structures.
To put things in context, in another previous paper, we proposed a knowledge base that defines what a processing chain, or a service, is and what are its constituting parts. This knowledge management was especially illustrated in the scope of a remote sensing services platform. Within this topic, we explored the possibilities to set an algorithm that gathers terms expressed by the users. After that, it structures them in another kind of knowledge graph to be reused. Such a knowledge graph, also called a thesaurus, structures natural language and formalizes the relations between the terms (i.e. synonyms, broader and narrower terms …). Finally, this knowledge might be mined to ensure the communication and provide a relevant answer through diverse uses.
The following picture shows the workflow from the user’s query in the upper left corner through the classification of services that best answer the initial query in the bottom left corner. Within this process, the relevant part of the query is extracted and improves the structure of the databases as it goes. Therefore, the more the tool is used, the more it is useful. Machine learning, you know.
Without getting deeper into details, queries are first processed by a Part-of-Speech Tagging module: every term is given a tag specifying its role in the sentence: verb, adjective, noun … Secondly, based on these tags, exceptions are filtered as many terms might bring fuzziness (e.g. the term “state” which might be related to the notion of “country” or related to the “condition”).
After that, thanks to these tags again, filters are applied to divide the different possibilities: some terms will be considered as spatial information, others not. These spatial terms will be put next to the GeoNames database (GeoNames geographical graph database covers all countries and contains over eleven million placenames that are available for download free of charge). Thanks to this spatial contextualization, the areas of interest of the query and services are compared and affect the services classification after all. What is intended for the terms that do not gather spatial information is a bit different: enhance and structure the knowledge of the web platform.
As it was explained, users may lack knowledge in a particular domain but still need to explore it. As not everything might be structured as useful for this domain or application, there was a need to create a dedicated knowledge base. In order to structure this knowledge, a wider reference ontology was used as a basis: the UNESCO Thesaurus (The UNESCO Thesaurus is a controlled and structured list of terms used in subject analysis and retrieval of documents and publications in the fields of education, culture, natural sciences, social and human sciences, communication and information). Data mining and knowledge building were used. Many more details might be found in the scientific paper at the following address: 10.5194/isprs-archives-XLII-2-W13-1593-2019