Autonomous Learning solves difficult problems found in Natural Language Processing
Prepared with Sing Koo, CTO, SiteFocus, 21 May 2018
Introduction
Natural Language Processing is an enabling technology that alleviates humans from the exhaustive task of digesting textual documents. To this extent, we have created a cloud platform called Communication in Focus (CIF) to analyze textual data on demand using Autonomous Learning. Autonomous Learning is not traditional Machine Learning (ML) or Deep Learning (DL). While ML and DL rely on past data, Autonomous Learning does not. Instead, it is based on observation of symbols that form the context of a scenario. Autonomous Learning may sound abstract for one to comprehend, however, it is derived from simple logic of symbols. Its implementation is based on symbolic logic. Symbolic logic is a science that has been around for more than a thousand of years. Instead of using huge numbers of data patterns to induce the logic between things and events, Autonomous Learning uses syllogism of “symbols in context” to draw relationships between the context, symbols, and relationship. As opposed to ML or DL, it does not rely on patterns or probabilities, instead, it binds context to symbols based on the rules of syllogism. Autonomous Learning earns its name by independently discovering entity relations between symbols and context without any human guidance such as that found in supervisory learning. On the other hand, Autonomous Learning differs from unsupervised learning in that it does not require manual classification of entities to learn.
Solving the Challenge of Natural Language
The difficulty in the implementation of Autonomous Learning is in the simplicity of symbolic logic and the complexity in deciphering semantics from higher order compositions of natural language. Our implementation of Autonomous Learning begins by retracing the modern development of symbolic logic by George Boole in the 19th century. A simple comprehension of symbolic logic can be obtained by observing the following statements:
Where A, B, C are symbols:
- If A implies B, and if B implies C,
- then, A implies C.
or it can be expressed as:
“If, if the first then the second and if the second then the third, then, if the first then the third”
Expression similar to the above are translated into first-order propositions where additional operations are carried out with operators to represent complex structures. When terms of a proposition are represented by letters, the proposition is abstract. When we substitute letters with words, the proposition is concrete. While it is beyond the scope of this writing to explore the syllogism of symbolic logic, reference is drawn to depict the foundation of the implementation of Autonomous Learning. Autonomous Learning deduces rationale between symbols in textual data based on the foundation of symbolic logic. By applying Autonomous Learning to natural language, our implementation treats each word or sequence of word as a symbol. It examines paragraphs and sentences for context. By applying discriminant to context, a minimum set of symbols is derived to represent the underlying textual data. ----By following the natural order of symbols used in textual data, a symbol-chain is created to represent the excerpt. By associating semantic context and sentiment context with symbol-chains, a semantic neighborhood is created. This technique is applied recursively until every related semantic neighborhood is exhausted. A simple semantic neighborhood may be represented by a single symbol-chain. In a complex semantic neighborhood, Autonomous Learning may discover multiple symbol-chains that articulates the same context which may share none or more symbols.
Autonomous Learning of natural language text by machine is analogous to human understanding of the same. A human does not read by drilling into the meaning one word at a time; human comprehension does not digest the meaning of each word. Human comprehension consists of reading a sentence, a paragraph, or an article for its context, and then associating words to context so as to gain an understanding of the underlying text. Autonomous Learning behaves in similar manner.
The need for ML/DL alternatives
The need for alternatives to ML or DL in natural language is paramount. This is because ML or DL requires large amount of past data to create models on narrow problem domains. When a problem domain is broad, new and previously unknown, ML or DL will not work. In dealing with natural language understanding, ML or DL can only work when subjects and entities that are well defined ahead of time. Examples of narrow domain applications that works well with ML or DL can be found in Open Calais, chatbot, or popular voice interfaces that respond to pre-defined inquiries such as the weather, driving instructions or product information, or a restaurant reservation. In other areas where natural language is used to articulate risk, complex events, new ideas and concepts or breaking news, ML or DL is rendered useless for reasons that inputs contain entities not defined in its models.
The Goal of Autonomous Learning
The incentive in developing Autonomous Learning is:
- To enable machines to learn new knowledge from textual data without human supervision so that human can be spared from reading voluminous amount of textual data
- Overcome bounded rationality with an enabling technology that unifies the aggregate knowledge
In doing so human can overcome the limitations of ML or DL. Instead of having humans to read textual data sentence by sentence for its content, we can delegate the reading to machine. Human can then draw upon the knowledge learned by the machine. For example, Autonomous Learning can be used to read enterprise email as part of the effort to prevent exfiltration of trade secrets, insider trading, or detect conflict of interest by bad actors. As the world transitions from offline to online, textual data used by enterprise business processes will increase substantially. Unable to timely read textual data can lead to irreparable and undesirable consequences. The following is a partial list of enterprise workflows that stand to benefit greatly from Autonomous Learning:
- Detection of trade secret exfiltration via email or other forms of textual communications
- Transcripts and documents such as regulations, policy, contract
- Litigation discovery
- Pharmaceutical, patient response on drug trial
- Knowledge management
- Risk assessment
- Competitive analysis and business intelligence
- Investor relations
- Voice of Customers, product review
- Enterprise resource management
- Human resource management
- Intellectual properties — patent, copyright
- Social network data
- Technical bulletins
Benchmark and Validation with the Enron Email Archive
Autonomous Learning makes it possible to process streams of textual data at scale for effective problem solving. Humans can tap the results of the aggregated knowledge using natural language queries. Queries are automatically translated into targets. Targets abstract queries into a symbol-chain. The utility may appear similar to search engines (finding relevant information), but is a far more effective approach. Where a search engine is limited to a word or phrase, targets use symbol-chains abstracted from the query and inferred from its aggregate knowledge.
For example, if one would enter the following query into an Autonomous Learning machine:
“I want to find out if any employee is conducting insider trading?”
The machine would interpret the query into symbol-chains, and then match by inference for similar symbol-chains from the knowledge-base over a given time period. If it finds anything that semantically resembles the communication of an insider trade, it will flag the communication and raise an alert. Having a system that is capable for monitoring communication through understanding and inference of natural language will provide a more comprehensive means for enterprises to de-risk a given target, such as “insider trading”.
For the purpose of running live tests with our Autonomous Learning engine, we subjected it to the Enron Email Archive, a data source made available to the public for research purposes after the debacle of Enron. We submitted the above query, without prejudice, to our Autonomous Learning engine without prior knowledge about Enron. Our Autonomous Learning engine was able to find the few initial “wrong doing” emails that eventually lead to the debacle of Enron. The benchmark results can be found here: https://www.sitefocus.com/enron.html
Benchmark and Validation with Streaming News – learning 120,000 rules in 3-weeks
We have also deployed this Autonomous Learning engine, without any preset training, dictionary or ontology, to watch news streams. It was able to learn more than 120,000 of entity relationships in less than three weeks of time by analyzing each news article autonomously. Subsequently, we fired up a simple query to test if our autonomous engine can answer a random query such as “China ZTE”.
Following is a query/response between a human inquiry and our Autonomous Learning engine. The response came took a few seconds after it received the query:
Human Inquiry: “China ZTE”
CIF Response:
- Symbol Chain: TRUMP -> CHINESE -> ZTE -> BUSINESS
- Excerpt: That leads us to Sunday, when Trump tweeted , President Xi of China, and I, are working together to give massive Chinese phone company, ZTE, a way to get back into business, fast
- POV Flag: -
- Target entity-relationships: (2)
(business*chinese*trump*zte)
(business*chinese*trump*zte*zte_corp) [inferred]
Closing Thoughts
It is worth mentioning that throughout the entire process, there is no human-assisted training, tagging, or setup in helping CIF to define or understand any of the entities shown in the above response. It is an answer coming strictly and entirely from CIF’s Autonomous Learning.