登录查看更多内容

July 16th Part 3 - Benchmark Tests for Large Language Models | Relationship between LLMs, KGs, Ontology

Sanjay Basu PhD

MIT Alumnus|Fellow IETE |AI/Quantum|Executive Leader|Author|5x Patents|Life Member-ACM,AAAI,Futurist

发布日期: 2023年7月26日

In continuation to the benchmarking topic from last week's newsletter - https://www.dhirubhai.net/pulse/july-newsletter-part-2-abcs-benchmarking-comparing-large-basu-phd - this week's first topic is demystifying additional benchmarks that the LLM practitioners run as the battery of standard tests to measure the comparative effectiveness of the Large Language Models.

The second topic I will cover is a monologue on how ontology development, large language models, and knowledge graphs enable the AI system to gain more accuracy and efficiency over the individual baselins.

-------------- The battery of Standard Tests

MMLU (Multi-Genre MultiLingual Universal Language Model) is a benchmark for evaluating language models' ability to understand and generate text across multiple genres and languages. It contains datasets for classification, summarization, and generation tasks in English, Chinese, Arabic, Spanish and more. For example, it could be used to evaluate how well a model summarizes a news article.

TriviaQA is a reading comprehension benchmark containing trivia questions and evidence documents to answer them. For instance, it may ask "What is the capital of Australia?" and provide relevant Wikipedia pages to deduce the answer is Canberra.

Natural Questions is a benchmark for reading comprehension requiring models to answer real user questions based on Wikipedia articles. An example question could be "When did the first airplane fly?" where the model must locate the answer in provided text.

GSM8K evaluates a model's ability to ground captions to images. It provides image-caption pairs and tests whether the model can match them correctly.

HumanEval tests a wide range of natural language understanding skills through verbal questions and answers. For instance, it may show an analogy like "Cat is to Kitten as Dog is to ?" and evaluate whether the model responds intelligently.

AGIEval focuses on advanced reasoning abilities beyond natural language through inductive, deductive, and spatial reasoning questions. Models must solve challenges like logical puzzles using language.

BooIQ evaluates reading comprehension through multiple choice questions that require logical reasoning. It provides a passage of text and then asks questions that cannot be answered solely from factual statements in the passage. For example, it may require inferring an implied meaning.

HellaSwag tests common sense reasoning and generalization. It provides a short context and possible completions, where the model must choose the most plausible ending. For instance, given a partial story, the model must complete it in a sensible way.

OpenBookQA measures open-domain question answering using facts from a provided science textbook. Questions require combining facts from different sections. For example, answering "How does sunlight contribute to plant growth?" may involve integrating information from chapters on sunlight and plant biology.

QuAC (Question Answering in Context) evaluates conversational question answering. It contains dialogues where a model must answer followup questions based on the dialogue history and a provided passage. The model must integrate conversational context.

Winograd Schema Challenge tests common sense reasoning with pronoun ambiguity resolution. It generates sentences with ambiguous pronouns and the model must disambiguate the referent. For example, resolving "The trophy wouldn't fit in the brown suitcase because it was too big" correctly interprets "it" refers to the trophy rather than the suitcase.

These benchmarks aim to measure distinct reasoning skills relevant to intelligent systems. Testing language models on diverse benchmarks push towards more human-like language understanding.

No alt text provided for this image — Courtesy: LinkedIn Post by Martin Ciupa

----------Don't forget to subscribe to my free Linkedin Newsletter and my free Medium subscription.

Connecting Ontology, Large Language Models, and Knowledge Graphs

In recent years, there has been rapid progress in three key areas of artificial intelligence: ontology development, large language models, and knowledge graphs. Though seemingly distinct, these three technologies are deeply interrelated, and understanding their connections can provide insight into AI's current capabilities and future directions. This section will explore the relationships between ontology, large language models, and knowledge graphs.

领英推荐

Advanced Prompt Techniques for Large Language Models

Sanjay Kumar MBA,MS,PhD 5 个月前

Unleashing the Power of Large Language Models (LLMs):…

Dr. Sankar Sree Ph.D 11 个月前

Unveiling the Future: Top Trends in Large Language…

Mohamed MARZOUGUI 8 个月前

Ontology and Large Language Models

An ontology formally represents knowledge within a domain, typically consisting of concepts, properties, and relations. Ontologies are a critical component for natural language processing systems to "understand" the meaning and context of language.

Large language models like GPT-3/4, Bloom, Cohere, LLAMA, and many more have shown impressive capabilities in text generation, question answering, and other natural language tasks. However, these models lack any formal ontology or knowledge representation. Their knowledge is implicit, encoded in the parameters of a neural network trained on massive text corpora. This allows flexibility in handling diverse topics and genres but limits their reasoning abilities.

Combining ontology with large language models provides complementary strengths. The ontology gives structure and formal semantics to ground the model's language capabilities. The neural network provides robust language understanding and generation to make the ontology useful in real-world applications. Projects like FLAVA aim to connect ontologies with foundation models like GPT-3. This allows models to generate text guided by ontological constraints, improving consistency, correctness, and reasoning ability.

Knowledge Graphs

Knowledge graphs (KGs) represent entities and relations in a graph structure. Popular knowledge graphs include DBpedia, Wikidata, YAGO, and the Google Knowledge Graph. Knowledge graphs capture facts about the world (people, places, things) and the connections between them. Knowledge graphs complement both ontologies and language models. Ontologies provide a schema for classification, but KGs add real-world instantiation of entities and relations. Language models supply text comprehension and generation capabilities but lack grounding in factual knowledge. Connecting language models to knowledge graphs like Wikidata improves their reasoning and accuracy by leveraging external, curated knowledge.

Projects like REALM from Google Research bridge language models and knowledge graphs by encoding KG triples into the model parameters. This "injects" facts and relationships into the model to enhance its world knowledge and factuality. REALM models show substantial accuracy gains on open-domain QA compared to baselines.

The Future of Connected, Full-Context AI

While it may still be in the early stages, the combination of ontology, language models, and knowledge graphs shows potential for creating more capable and grounded AI systems. Ontologies offer a structured approach, language models provide flexibility, and knowledge graphs offer real-world facts. When used together, they complement each other's strengths and weaknesses, resulting in a more effective overall system.

One of the major technical hurdles for AI systems is creating ontology standards that work well with various NLP models. Significant challenges include efficiently coding large-scale knowledge graphs for neural networks and managing multimodal information that merges text, images, and data. As solutions to these issues are developed, interconnected AI systems will be able to produce more advanced and accurate text, respond to queries, and make logical deductions about the world.

The relationships between ontology, language models, and knowledge graphs underscore the interconnected nature of progress in AI. Bringing these technologies together and building on their synergies will enable the next generation of intelligent systems that understand, reason, and communicate at an unprecedented level.

Here is a line diagram showing the relationships between ontology, large language models (LLMs), and knowledge graphs (KGs):

The key relationships are:

Ontology informs the schema and structure for knowledge graphs and large language models

Knowledge graphs provide real-world facts and relationships to ground the models

Large language models connect ontologies and knowledge graphs, enhancing reasoning and textual understanding

The ontology provides the formal representation to define concepts and relations. This gives structure and semantics that KGs and LLMs build upon.

KGs instantiate real-world entities and facts as nodes and edges in a graph. This grounds the models in factual knowledge.

LLMs utilize their robust language capabilities to make ontologies and KGs useful in applications. The models connect them together and enhance each other's capabilities.

The three technologies are interconnected and complementary for more capable AI systems. This diagram aims to visualize how ontology, KGs, and LLMs are interconnected and leverage each other's strengths for continued progress in AI knowledge representation and reasoning.

-----------Don't forget to subscribe to my free Linkedin Newsletter and my free Medium subscription.

A Technocrat's discernment

4,012 位关注者

要查看或添加评论，请登录

Sanjay Basu PhD的更多文章

Axiomatic Insights

2025年3月17日

Axiomatic Insights

I’m particularly excited about the NVIDIA GTC 2025 #nvidiagtc2025 conference that I’m attending this week. The…

5 条评论
Digital Selfhood

2025年3月17日

Digital Selfhood

I was thrilled to be busy supporting our incredible team as we celebrated yet another phenomenal and successful third…
Axiomatic Thinking

2025年3月16日

Axiomatic Thinking

Building Knowledge from First Principles Axiomatic thinking represents one of humanity's most influential intellectual…

1 条评论
Small Models, Big Teamwork

2025年3月9日

Small Models, Big Teamwork

Why Multi-Agent Workflows Shine with Compact Powerhouses In our previous discussion, we explored the rising…

1 条评论
Small Models, Big Impact

2025年3月6日

Small Models, Big Impact

Why Size Isn’t Everything in AI Small models matter—a lot. It’s easy to get dazzled by trillion-parameter giants that…

7 条评论
Choosing to Rise Instead of Run

2025年2月25日

Choosing to Rise Instead of Run

From Stammer to Stage There are two kinds of people in this world: those who, when faced with adversity, Forget…

18 条评论
When Magnets Get Moody

2025年2月18日

When Magnets Get Moody

Beyond Ferromagnetism and Antiferromagnetism For decades, the magnetic world was essentially a two-act play. On one…
A brief take on Causal AI

2025年2月13日

A brief take on Causal AI

Bridging Correlation and Explanation Causal AI represents a significant turning point in how we think about and build…

4 条评论
The Rise of Home Companion Robots

2025年2月12日

The Rise of Home Companion Robots

Are We Outsourcing Our Humanity? Remember when the most advanced technology in your home was a toaster that could…

2 条评论
Navigating the Complexity of Navier–Stokes

2025年2月11日

Navigating the Complexity of Navier–Stokes

From Laminar to Turbulent Flows Let’s dive into the fascinating world of fluid dynamics and take a closer look at one…

See all articles

July 16th Part 3 - Benchmark Tests for Large Language Models | Relationship between LLMs, KGs, Ontology

Sanjay Basu PhD

MIT Alumnus|Fellow IETE |AI/Quantum|Executive Leader|Author|5x Patents|Life Member-ACM,AAAI,Futurist

-------------- The battery of Standard Tests

Connecting Ontology, Large Language Models, and Knowledge Graphs

领英推荐

A Technocrat's discernment

4,012 位关注者

Sanjay Basu PhD的更多文章

社区洞察

其他会员也浏览了

Evaluating Large Language Models (LLMs): A Standard Set of Metrics for Accurate Assessment

Evaluating Large Language Models (LLMs)

Finetuning Large Language Models: A Comprehensive Guide

Next-Generation LLM Evaluation: Bridging Academic Benchmarks and Real-World Performance

The Untapped Power of Large Language Models as Qualitative Comprehension Engines: Unexplored Use Cases of Chat-GPT and other LLM’s

Evaluating Large Language Models: Key Metrics for Comprehensive Performance Assessment

QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Mixture-of-Agents Enhances Large Language Model Capabilities: A Comprehensive Overview

How to Improve LLM Evaluation for Responsible AI

On to Knowledge-infused Language Models

-------------- The battery of Standard Tests

Connecting Ontology, Large Language Models, and Knowledge Graphs

领英推荐

A Technocrat's discernment

4,012 位关注者

Sanjay Basu PhD的更多文章

Axiomatic Insights

Digital Selfhood

Axiomatic Thinking

Small Models, Big Teamwork

Small Models, Big Impact

Choosing to Rise Instead of Run

When Magnets Get Moody

A brief take on Causal AI

The Rise of Home Companion Robots

Navigating the Complexity of Navier–Stokes

社区洞察

其他会员也浏览了

Evaluating Large Language Models (LLMs): A Standard Set of Metrics for Accurate Assessment

Evaluating Large Language Models (LLMs)

Finetuning Large Language Models: A Comprehensive Guide

Next-Generation LLM Evaluation: Bridging Academic Benchmarks and Real-World Performance

The Untapped Power of Large Language Models as Qualitative Comprehension Engines: Unexplored Use Cases of Chat-GPT and other LLM’s

Evaluating Large Language Models: Key Metrics for Comprehensive Performance Assessment

QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

Mixture-of-Agents Enhances Large Language Model Capabilities: A Comprehensive Overview

How to Improve LLM Evaluation for Responsible AI

On to Knowledge-infused Language Models