July 16th Part 3 - Benchmark Tests for   Large Language Models | Relationship between LLMs, KGs, Ontology
Copyright: Sanjay Basu

July 16th Part 3 - Benchmark Tests for Large Language Models | Relationship between LLMs, KGs, Ontology

In continuation to the benchmarking topic from last week's newsletter - https://www.dhirubhai.net/pulse/july-newsletter-part-2-abcs-benchmarking-comparing-large-basu-phd - this week's first topic is demystifying additional benchmarks that the LLM practitioners run as the battery of standard tests to measure the comparative effectiveness of the Large Language Models.

The second topic I will cover is a monologue on how ontology development, large language models, and knowledge graphs enable the AI system to gain more accuracy and efficiency over the individual baselins.

-------------- The battery of Standard Tests

MMLU (Multi-Genre MultiLingual Universal Language Model) is a benchmark for evaluating language models' ability to understand and generate text across multiple genres and languages. It contains datasets for classification, summarization, and generation tasks in English, Chinese, Arabic, Spanish and more. For example, it could be used to evaluate how well a model summarizes a news article.

TriviaQA is a reading comprehension benchmark containing trivia questions and evidence documents to answer them. For instance, it may ask "What is the capital of Australia?" and provide relevant Wikipedia pages to deduce the answer is Canberra.

Natural Questions is a benchmark for reading comprehension requiring models to answer real user questions based on Wikipedia articles. An example question could be "When did the first airplane fly?" where the model must locate the answer in provided text.

GSM8K evaluates a model's ability to ground captions to images. It provides image-caption pairs and tests whether the model can match them correctly.

HumanEval tests a wide range of natural language understanding skills through verbal questions and answers. For instance, it may show an analogy like "Cat is to Kitten as Dog is to ?" and evaluate whether the model responds intelligently.

AGIEval focuses on advanced reasoning abilities beyond natural language through inductive, deductive, and spatial reasoning questions. Models must solve challenges like logical puzzles using language.

BooIQ evaluates reading comprehension through multiple choice questions that require logical reasoning. It provides a passage of text and then asks questions that cannot be answered solely from factual statements in the passage. For example, it may require inferring an implied meaning.

HellaSwag tests common sense reasoning and generalization. It provides a short context and possible completions, where the model must choose the most plausible ending. For instance, given a partial story, the model must complete it in a sensible way.

OpenBookQA measures open-domain question answering using facts from a provided science textbook. Questions require combining facts from different sections. For example, answering "How does sunlight contribute to plant growth?" may involve integrating information from chapters on sunlight and plant biology.

QuAC (Question Answering in Context) evaluates conversational question answering. It contains dialogues where a model must answer followup questions based on the dialogue history and a provided passage. The model must integrate conversational context.

Winograd Schema Challenge tests common sense reasoning with pronoun ambiguity resolution. It generates sentences with ambiguous pronouns and the model must disambiguate the referent. For example, resolving "The trophy wouldn't fit in the brown suitcase because it was too big" correctly interprets "it" refers to the trophy rather than the suitcase.

These benchmarks aim to measure distinct reasoning skills relevant to intelligent systems. Testing language models on diverse benchmarks push towards more human-like language understanding.

No alt text provided for this image
Courtesy: LinkedIn Post by Martin Ciupa


----------Don't forget to subscribe to my free Linkedin Newsletter and my free Medium subscription.

Connecting Ontology, Large Language Models, and Knowledge Graphs

In recent years, there has been rapid progress in three key areas of artificial intelligence: ontology development, large language models, and knowledge graphs. Though seemingly distinct, these three technologies are deeply interrelated, and understanding their connections can provide insight into AI's current capabilities and future directions. This section will explore the relationships between ontology, large language models, and knowledge graphs.

Ontology and Large Language Models

An ontology formally represents knowledge within a domain, typically consisting of concepts, properties, and relations. Ontologies are a critical component for natural language processing systems to "understand" the meaning and context of language.

Large language models like GPT-3/4, Bloom, Cohere, LLAMA, and many more have shown impressive capabilities in text generation, question answering, and other natural language tasks. However, these models lack any formal ontology or knowledge representation. Their knowledge is implicit, encoded in the parameters of a neural network trained on massive text corpora. This allows flexibility in handling diverse topics and genres but limits their reasoning abilities.

Combining ontology with large language models provides complementary strengths. The ontology gives structure and formal semantics to ground the model's language capabilities. The neural network provides robust language understanding and generation to make the ontology useful in real-world applications. Projects like FLAVA aim to connect ontologies with foundation models like GPT-3. This allows models to generate text guided by ontological constraints, improving consistency, correctness, and reasoning ability.

Knowledge Graphs

Knowledge graphs (KGs) represent entities and relations in a graph structure. Popular knowledge graphs include DBpedia, Wikidata, YAGO, and the Google Knowledge Graph. Knowledge graphs capture facts about the world (people, places, things) and the connections between them. Knowledge graphs complement both ontologies and language models. Ontologies provide a schema for classification, but KGs add real-world instantiation of entities and relations. Language models supply text comprehension and generation capabilities but lack grounding in factual knowledge. Connecting language models to knowledge graphs like Wikidata improves their reasoning and accuracy by leveraging external, curated knowledge.

Projects like REALM from Google Research bridge language models and knowledge graphs by encoding KG triples into the model parameters. This "injects" facts and relationships into the model to enhance its world knowledge and factuality. REALM models show substantial accuracy gains on open-domain QA compared to baselines.

The Future of Connected, Full-Context AI

While it may still be in the early stages, the combination of ontology, language models, and knowledge graphs shows potential for creating more capable and grounded AI systems. Ontologies offer a structured approach, language models provide flexibility, and knowledge graphs offer real-world facts. When used together, they complement each other's strengths and weaknesses, resulting in a more effective overall system.

One of the major technical hurdles for AI systems is creating ontology standards that work well with various NLP models. Significant challenges include efficiently coding large-scale knowledge graphs for neural networks and managing multimodal information that merges text, images, and data. As solutions to these issues are developed, interconnected AI systems will be able to produce more advanced and accurate text, respond to queries, and make logical deductions about the world.

The relationships between ontology, language models, and knowledge graphs underscore the interconnected nature of progress in AI. Bringing these technologies together and building on their synergies will enable the next generation of intelligent systems that understand, reason, and communicate at an unprecedented level.

Here is a line diagram showing the relationships between ontology, large language models (LLMs), and knowledge graphs (KGs):

No alt text provided for this image
Copyright: Sanjay Basu

The key relationships are:

Ontology informs the schema and structure for knowledge graphs and large language models

Knowledge graphs provide real-world facts and relationships to ground the models

Large language models connect ontologies and knowledge graphs, enhancing reasoning and textual understanding

The ontology provides the formal representation to define concepts and relations. This gives structure and semantics that KGs and LLMs build upon.

KGs instantiate real-world entities and facts as nodes and edges in a graph. This grounds the models in factual knowledge.

LLMs utilize their robust language capabilities to make ontologies and KGs useful in applications. The models connect them together and enhance each other's capabilities.

The three technologies are interconnected and complementary for more capable AI systems. This diagram aims to visualize how ontology, KGs, and LLMs are interconnected and leverage each other's strengths for continued progress in AI knowledge representation and reasoning.

-----------Don't forget to subscribe to my free Linkedin Newsletter and my free Medium subscription.

要查看或添加评论,请登录

Sanjay Basu PhD的更多文章

  • Axiomatic Insights

    Axiomatic Insights

    I’m particularly excited about the NVIDIA GTC 2025 #nvidiagtc2025 conference that I’m attending this week. The…

    5 条评论
  • Digital Selfhood

    Digital Selfhood

    I was thrilled to be busy supporting our incredible team as we celebrated yet another phenomenal and successful third…

  • Axiomatic Thinking

    Axiomatic Thinking

    Building Knowledge from First Principles Axiomatic thinking represents one of humanity's most influential intellectual…

    1 条评论
  • Small Models, Big Teamwork

    Small Models, Big Teamwork

    Why Multi-Agent Workflows Shine with Compact Powerhouses In our previous discussion, we explored the rising…

    1 条评论
  • Small Models, Big Impact

    Small Models, Big Impact

    Why Size Isn’t Everything in AI Small models matter—a lot. It’s easy to get dazzled by trillion-parameter giants that…

    7 条评论
  • Choosing to Rise Instead of Run

    Choosing to Rise Instead of Run

    From Stammer to Stage There are two kinds of people in this world: those who, when faced with adversity, Forget…

    18 条评论
  • When Magnets Get Moody

    When Magnets Get Moody

    Beyond Ferromagnetism and Antiferromagnetism For decades, the magnetic world was essentially a two-act play. On one…

  • A brief take on Causal AI

    A brief take on Causal AI

    Bridging Correlation and Explanation Causal AI represents a significant turning point in how we think about and build…

    4 条评论
  • The Rise of Home Companion Robots

    The Rise of Home Companion Robots

    Are We Outsourcing Our Humanity? Remember when the most advanced technology in your home was a toaster that could…

    2 条评论
  • Navigating the Complexity of Navier–Stokes

    Navigating the Complexity of Navier–Stokes

    From Laminar to Turbulent Flows Let’s dive into the fascinating world of fluid dynamics and take a closer look at one…

社区洞察

其他会员也浏览了