July 16th Part 3 - Benchmark Tests for Large Language Models | Relationship between LLMs, KGs, Ontology
Sanjay Basu PhD
MIT Alumnus|Fellow IETE |AI/Quantum|Executive Leader|Author|5x Patents|Life Member-ACM,AAAI,Futurist
In continuation to the benchmarking topic from last week's newsletter - https://www.dhirubhai.net/pulse/july-newsletter-part-2-abcs-benchmarking-comparing-large-basu-phd - this week's first topic is demystifying additional benchmarks that the LLM practitioners run as the battery of standard tests to measure the comparative effectiveness of the Large Language Models.
The second topic I will cover is a monologue on how ontology development, large language models, and knowledge graphs enable the AI system to gain more accuracy and efficiency over the individual baselins.
-------------- The battery of Standard Tests
MMLU (Multi-Genre MultiLingual Universal Language Model) is a benchmark for evaluating language models' ability to understand and generate text across multiple genres and languages. It contains datasets for classification, summarization, and generation tasks in English, Chinese, Arabic, Spanish and more. For example, it could be used to evaluate how well a model summarizes a news article.
TriviaQA is a reading comprehension benchmark containing trivia questions and evidence documents to answer them. For instance, it may ask "What is the capital of Australia?" and provide relevant Wikipedia pages to deduce the answer is Canberra.
Natural Questions is a benchmark for reading comprehension requiring models to answer real user questions based on Wikipedia articles. An example question could be "When did the first airplane fly?" where the model must locate the answer in provided text.
GSM8K evaluates a model's ability to ground captions to images. It provides image-caption pairs and tests whether the model can match them correctly.
HumanEval tests a wide range of natural language understanding skills through verbal questions and answers. For instance, it may show an analogy like "Cat is to Kitten as Dog is to ?" and evaluate whether the model responds intelligently.
AGIEval focuses on advanced reasoning abilities beyond natural language through inductive, deductive, and spatial reasoning questions. Models must solve challenges like logical puzzles using language.
BooIQ evaluates reading comprehension through multiple choice questions that require logical reasoning. It provides a passage of text and then asks questions that cannot be answered solely from factual statements in the passage. For example, it may require inferring an implied meaning.
HellaSwag tests common sense reasoning and generalization. It provides a short context and possible completions, where the model must choose the most plausible ending. For instance, given a partial story, the model must complete it in a sensible way.
OpenBookQA measures open-domain question answering using facts from a provided science textbook. Questions require combining facts from different sections. For example, answering "How does sunlight contribute to plant growth?" may involve integrating information from chapters on sunlight and plant biology.
QuAC (Question Answering in Context) evaluates conversational question answering. It contains dialogues where a model must answer followup questions based on the dialogue history and a provided passage. The model must integrate conversational context.
Winograd Schema Challenge tests common sense reasoning with pronoun ambiguity resolution. It generates sentences with ambiguous pronouns and the model must disambiguate the referent. For example, resolving "The trophy wouldn't fit in the brown suitcase because it was too big" correctly interprets "it" refers to the trophy rather than the suitcase.
These benchmarks aim to measure distinct reasoning skills relevant to intelligent systems. Testing language models on diverse benchmarks push towards more human-like language understanding.
----------Don't forget to subscribe to my free Linkedin Newsletter and my free Medium subscription.
Connecting Ontology, Large Language Models, and Knowledge Graphs
In recent years, there has been rapid progress in three key areas of artificial intelligence: ontology development, large language models, and knowledge graphs. Though seemingly distinct, these three technologies are deeply interrelated, and understanding their connections can provide insight into AI's current capabilities and future directions. This section will explore the relationships between ontology, large language models, and knowledge graphs.
领英推荐
Ontology and Large Language Models
An ontology formally represents knowledge within a domain, typically consisting of concepts, properties, and relations. Ontologies are a critical component for natural language processing systems to "understand" the meaning and context of language.
Large language models like GPT-3/4, Bloom, Cohere, LLAMA, and many more have shown impressive capabilities in text generation, question answering, and other natural language tasks. However, these models lack any formal ontology or knowledge representation. Their knowledge is implicit, encoded in the parameters of a neural network trained on massive text corpora. This allows flexibility in handling diverse topics and genres but limits their reasoning abilities.
Combining ontology with large language models provides complementary strengths. The ontology gives structure and formal semantics to ground the model's language capabilities. The neural network provides robust language understanding and generation to make the ontology useful in real-world applications. Projects like FLAVA aim to connect ontologies with foundation models like GPT-3. This allows models to generate text guided by ontological constraints, improving consistency, correctness, and reasoning ability.
Knowledge Graphs
Knowledge graphs (KGs) represent entities and relations in a graph structure. Popular knowledge graphs include DBpedia, Wikidata, YAGO, and the Google Knowledge Graph. Knowledge graphs capture facts about the world (people, places, things) and the connections between them. Knowledge graphs complement both ontologies and language models. Ontologies provide a schema for classification, but KGs add real-world instantiation of entities and relations. Language models supply text comprehension and generation capabilities but lack grounding in factual knowledge. Connecting language models to knowledge graphs like Wikidata improves their reasoning and accuracy by leveraging external, curated knowledge.
Projects like REALM from Google Research bridge language models and knowledge graphs by encoding KG triples into the model parameters. This "injects" facts and relationships into the model to enhance its world knowledge and factuality. REALM models show substantial accuracy gains on open-domain QA compared to baselines.
The Future of Connected, Full-Context AI
While it may still be in the early stages, the combination of ontology, language models, and knowledge graphs shows potential for creating more capable and grounded AI systems. Ontologies offer a structured approach, language models provide flexibility, and knowledge graphs offer real-world facts. When used together, they complement each other's strengths and weaknesses, resulting in a more effective overall system.
One of the major technical hurdles for AI systems is creating ontology standards that work well with various NLP models. Significant challenges include efficiently coding large-scale knowledge graphs for neural networks and managing multimodal information that merges text, images, and data. As solutions to these issues are developed, interconnected AI systems will be able to produce more advanced and accurate text, respond to queries, and make logical deductions about the world.
The relationships between ontology, language models, and knowledge graphs underscore the interconnected nature of progress in AI. Bringing these technologies together and building on their synergies will enable the next generation of intelligent systems that understand, reason, and communicate at an unprecedented level.
Here is a line diagram showing the relationships between ontology, large language models (LLMs), and knowledge graphs (KGs):
The key relationships are:
Ontology informs the schema and structure for knowledge graphs and large language models
Knowledge graphs provide real-world facts and relationships to ground the models
Large language models connect ontologies and knowledge graphs, enhancing reasoning and textual understanding
The ontology provides the formal representation to define concepts and relations. This gives structure and semantics that KGs and LLMs build upon.
KGs instantiate real-world entities and facts as nodes and edges in a graph. This grounds the models in factual knowledge.
LLMs utilize their robust language capabilities to make ontologies and KGs useful in applications. The models connect them together and enhance each other's capabilities.
The three technologies are interconnected and complementary for more capable AI systems. This diagram aims to visualize how ontology, KGs, and LLMs are interconnected and leverage each other's strengths for continued progress in AI knowledge representation and reasoning.
-----------Don't forget to subscribe to my free Linkedin Newsletter and my free Medium subscription.