Natural Language to Structured Queries
Gopi Krishna Suvanam
Entrepreneur | Author | AI & Decentralization proponent | Alumnus of IIT-M & IIM-A
"What was total profit in California last March? and which products were selling the most?"
Fairly innocuous questions right? But to convert them into a structured query that machines understand and respond to is not straight forward. Generating structured query from natural language has been researched for a long time. Modern packages like NLTK in Python make it easier. But they don’t give you a plug and play solution. The most promising open source framework I’ve seen is Quepy: A Python framework to transform natural language questions to queries. But that requires a lot of further development.
We (at G-Square Solutions) have built an in-house tool to convert natural language into SQL-like structured queries. We use a combination of several techniques to achieve this. Some of them are:
- Using NLP techniques identify named entities
- Using NLP to classify phrases into different parts of speech
- Learning from the data on what are various fields on which questions can be asked
- Using machine learning on user generated questions to predict associations between words and sequence of words in a query.
- Some bit of hard-coding :)
Some amount of application specific customization is also required in most cases. For example if your queries are of the type “who is the president of Zambia” a particular approach needs to be taken whereas if the queries are of the type “What is the total sales for G-Square in India in 2010” a different approach needs to be taken. First case is a simple information retrieval, in the second case there is some amount of aggregation/analytics also required. Another type of querying could be a simple question like "How is business doing in New York?". A lot of contextualization needs to be built in to understand such questions and respond.