Microsoft Research Wants You to Use Natural Language to Access Web APIs
Jesus Rodriguez
CEO of IntoTheBlock, Co-Founder, President at Faktory, Co-Founder, President NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.
Is natural language the ultimate protocol for application program interfaces(APIs)? The idea seems intriguing. Web APIs have become an omnipresent part of modern software architectures. Conceptually, each API defines a protocol and a semantic model to access and interpret data respectively. Over the years, universal protocols have been developed to abstract common capabilities of APIs. For instance, GraphQL provides a generic protocol to access data via a Web API. However, those semi-generic protocols still rely on individual syntax and semantics to encode and process data. With the raise in popularity of conversational interfaces, the idea of a using natural language to interact with Web APIs have been gaining popularity. Recently, a group of artificial intelligence(AI) researchers from Microsoft published a research paper that proposes a clever architecture to develop natural language interfaces for Web APIs.
The idea of creating a conversational protocol for Web API is certainly interesting but is not without challenges. The protocol of a Web API establishes a constrained structure to interact with resources. However, the same API call can be expressed in infinite ways using natural language. If to that, we add the fact that APIs typically use parameters to customize a specific action then the challenge is even worse as the combinatorial variations of parameters have different representations in natural language. For instance, consider a scenario in which we are using CRM API to retrieve the information about specific accounts. Natural language expressions such as “Who is the contact for account A?”, “Could you find me the contact for account A?”, “Who represents account A?” are all representations of the same API call.
The second challenge with conversational interfaces for Web APIs is related to the supervised nature of natural language processing(NLP) models. In order to train a natural language interface for Web APIs, a system will need access to high quality labeled data about those APIs which is not very easy to find.
NL2API: A Framework for Natural Language Interfaces for Web APIs
In their research, the Microsoft team introduced a framework called NL2API that uses deep neural networks to infer API calls from natural language sentences. The core architecture of the NL2API framework is based on encoder-decoder models with a small twist based on decomposing the decoder into multiple interpretable components called modules. Each module specializes in predicting a pre-defined kind of output, for example, instantiating a specific parameter by reading the input utterance in NL2API.
Architecturally, modules are NL2API’s modules are specialized neural networks specialized on perform a specific task based on a specific set of parameters. In our CRM example, suppose that we are processing an utterance like “Give me the accounts with revenues over $1M in the last year and group them by city”. In that example, commands such as GET(Accounts), FILTER(Revenue > $1M) or GROUPBY(City) can all be considered individual modules. In simple terms, instead of using a single decoder for processing the whole sentence such as traditional encoder-decoder architectures, the NL2API model will use different decoders for predicting specific parameters which helps to improve the semantic richness of the model.
Another important component of the NL2API framework is the controller which is responsible for determining what modules will be triggered for a specific utterance. Specifically, the controller is also implemented as an attentive decoder. Using the encoding of the utterance as input, it generates a sequence of modules, called the layout. The modules then generate their respective parameters and finally the parameters are composed to form the final API call.
Training NL2API
At the beginning of this article, I mentioned that one of the biggest challenge with developing natural language interfaces for Web APIs is the lack of high quality labeled data. The modular approach followed proposed by Microsoft’s NL2API also helps with this challenge. Given a specific Web API, NL2API first generates a series of sample calls and decomposes them into canonical modules using a simple grammar. After that the system uses a crowdsource model to paraphrase the specific commands.
The crowdsourcing approach to training is clever but certainly not economic as the combinatorial explosion of parameters of any API makes it almost impossible to annotate. To address this challenge, NL2API uses a hierarchical probabilistic model for the crowdsourcing process, which provides information to later decide which API calls to annotate. The NL2API calls this approach the Semantic Mesh as its computationally represented as a mesh connecting the possible API calls/parameter combinations. Semantic mesh gives a holistic view of the whole API call space as well as the interplay of utterances and API calls, based on which trainers can selectively annotate only a subset of high-value API calls. In the initial testing, the Sequence Mesh outperformed more traditional training models such as Seq2Seq.
The Microsoft Research team, tested NL2API by generating natural language interfaces for the popular Microsoft Graph API suite. The results validated that the NL2API model could be a more viable approach to enable the first generation of natural language interfaces for Web APIs.
Thinknowlogy is the world's only naturally intelligent knowledge technology, based on Laws of Intelligence that are naturally found in the human language. Open souce software.
6 年NLP has fundamental problems: During the NLP process, rich and meaningful sentences are degraded to "bags of keywords", by which the natural structure of sentences is discarded, like a two-dimensional movie has lost the three-dimensional spatial information. Moreover, scientists are even ignorant of the logical structures of sentences, that are provided by nature. Three examples to illustrate the loss of information: 1) Scientists are unable to describe the following childishly simple conversion – or vice versa – through an algorithm, because the intelligent function in language of possessive verb “has” is not described in any scientific paper: > Given: “Paul is a son of John.” ? ? Generated conclusion: < “John has a son, called Paul.” Both sentences mentioned above have the same meaning. So, it is possible to convert one sentence to the other – and back – through an algorithm. But apparently, scientists are unable to define such an algorithm. My (simplified) algorithm: ? Swap both proper nouns; ? Replace basic verb “is” by possessive verb “has” (or vice versa); ? Replace preposition “of” by adjective “called” (or vice versa). 2) Algebra describes the Exclusive OR (XOR) function. But scientists are unable to relate this function to its linguistic equivalent: conjunction “or”. So, there is no technique available to generate the following question through an algorithm: > Given: “Every person is a man or a woman.” > Given: “Addison is a person.” ? ? Generated question: < “Is Addison a man or a woman?” My algorithm: ? Conjunction “or” has the logical function (Exclusive OR) in language to separate knowledge; ? Given “Every person is a man or a woman” and “Addison is a person”; ? Substitution of both sentences: “Addison is a man or a woman”; ? Conversion to a question: “Is Addison a man or a woman?”. 3) Algebra is not defined for the past tense. So, the following conclusions based on past tense are not described in any scientific paper: > Given: “James was the father of Peter.” ? ? Generated conclusions: < “Peter has no father anymore.” < “Peter had a father, called James.” ------------- I am the only one in the world who has defined intelligence as a set of natural laws. Implemented in software, my CNL reasoner has results that scientists can't deliver. I am using fundamental science / basic research (logic and laws of nature) instead of cognitive science (simulation of behavior), because: ? Autonomous reasoning requires both intelligence and language; ? Intelligence and language are natural phenomena; ? Natural phenomena obey laws of nature; ? Laws of nature (and logic) are investigated using fundamental science. Using fundamental science, I gained knowledge and experience that no one else has: ? I have defined intelligence in a natural way, as a set of natural laws; ? Using this definition, I have discovered a logical relationship between natural intelligence and natural language, which I am implementing in software; ? And I defy anyone to beat the simplest results of my extended Controlled Natural Language (CNL) reasoner in a generic way: from natural language, through algorithms, back to natural language. See: https://mafait.org/challenge/ It is free and open source software. Feel free to join.
Independent Developer and Researcher
6 年This article mentions one challenge in this effort - the many ways that natural language can express one thought.? It fails to mention the very large inverse challenge - the many thoughts that one expression can represent, that is, the the very significant ambiguity in natural languages.? One can see a bias in the examples that somewhat minimizes those challenges - the use of a SQL concept, GROUP BY.? Well you might say that SQL invented decades ago went half way to the goal. ? But those generating the examples had a bias toward technical language.? I'll guess that this project might go reasonably well for search queries.? Google already does well at that.? My last work in maps and navigation in 2013 was to develop an NLU interfaces for map queries, a very narrow domain.? That worked, but it was only for queries.? It did not even require training data.? But now try to create an NLU interface to manage or modify data.? How can the ambiguity of natural language be tolerated in this project?