Towards AGI: Improving LLMs' Reasoning
Homer Simpson's x-ray.

Towards AGI: Improving LLMs' Reasoning

Acronyms

  • LLM: Large Language Model (e.g. ChatGPT, Bloom, BERT)
  • AGI: Artificial General Intelligence (AI that matches and exceeds human-level intelligence)

Abstract

Generative AI such as ChatGPT, Llama, and BERT have logic and seem to respond on the level of human experts, but this wisdom can be shallow. Logic questions that are common sense for humans can baffle LLMs, especially on original topics not be found in models' training data.

The next step towards achieving human-level intelligence is to add conceptual thinking to LLMs. One way to do this is using associative logic with knowledge graphs (KGs). Concepts in KGs can be represented as patterns of activated nodes. “Thinking” would be transitions from one pattern of nodes to another.

LLMs are a huge step forward for AI, but they need to be combined with a strong logic engine and integrated into numerous everyday applications to fully use the LLMs' potential.

Where LLMs Succeed

LLMs are great at working with small bodies of text. The models are trained on Giga/Tera-bytes of data, but the amount of situation-specific data (context-window) is still small: few to few dozen pages of text. The context window of ChatGPT is currently <= 32K tokens [1]. The context window is quickly growing though.

LLMs do a great job of answering questions, making suggestions, or writing (code, articles, stories, etc.) based on the prompt and context window.

Where LLMs?Fail

LLMs struggle with reasoning that requires some real world common sense and basic abstract thinking. Some examples from ChatGPT3.5 asked this year (2023).

No alt text provided for this image
Correct answer: 2023 - 2008 = 15. 35-15 = 20 years old.

Even if we use ChatGPT 3.5's last year of training, 2022, answer is still off [2].

No alt text provided for this image
Correct answer: 19th + 7 = 26th is expiration. Today is 20th. Five days ago was 15th. 26th-15th = 11 days.

The mistake ChatGPT made here is using yesterday, not today, to determine what is five days ago.

No alt text provided for this image
Correct answer: John's friends are at his (John's) home. Should be "at home" not "a home".
No alt text provided for this image
Correct answer: 3
No alt text provided for this image
Correct answer: CABDE

Takeaways from above examples:

  • LLMs often struggle with basic math
  • Common sense knowledge (e.g. basic physics) is often missing
  • Novel situations (new data / scenarios) may baffle the models

LLMs are still a statistical engine (neural network) to predict the next most likely token based on a prompt and context. A vastly simplified view.

No alt text provided for this image
The core of LLMs, neural networks.

Above said, LLMs do have logic. They answer questions about scenarios seen in training data or in the context window. LLMs are able to generalize past knowledge to new prompts. LLMs' approach is still a bit brute-force: mix-and-match Giga/Tera-bytes of past data to give the highest probability response to the current prompt. The logic embedded in the millions/billions of LLM edge weights is still somewhat of a fuzzy-logic repeater of text patterns in the training data.

Next Leg of AGI Journey

I see LLMs and other large models (e.g. Convolutional Neural Networks, CNNs, for image processing) as specialized regions of the brain for audio-visual data processing. There is a whole separate grey matter brain layer for complex reasoning. We've built the auditory and visual cortices, but still need to build the prefrontal cortex.

How to Add Reasoning to LLMs

One approach is combining other systems with LLMs. Some examples are below.

Program-Aided Language Models can link a Python interpreter to the LLM so the mathematical operations are written as code, executed by the interpreter, and sent as a response to the user [3]. This helps relieve the basic math weakness of LLMs.

Chain of Thought Prompting is a technique to ask a model to explain its steps [4]. Almost like asking a student to list out the steps for solving a problem. This often helps the model avoid mistakes and requires no model changes.

Retrieval-Augmented Generation of Knowledge allows a model to query external sources for updated information [5]. This is very useful for up-to-date answers for data generated after the model was trained.

The above help address the reasoning and logic LLM weakness, but are more like patches, not fixing the root cause. LLMs' understanding of basic physics, for example, is still limited to some text in training data. Let's say I ask if a stick of wood can be stretched. The model will likely iterate the neural network nodes trained on text relating to inflexibility of wood and generate a negative response. For novel applications of wood on which there is no training data, the model may likely fail. To be really understood, this information needs to be in a model of the world that describes properties of wood. Then several steps of logic and analogies are needed to see if an application is possible on the material. One such world model is a knowledge graph.

Knowledge Graphs

Let's say we use the vast volumes of training data that was used to train an LLM to build a knowledge graph. The knowledge graph (KG) would have nodes for nouns and adjectives and it would have edges for relationships and verbs. Now we ask the model again if wood is stretchable.

No alt text provided for this image
Simplified example of knowledge graph relating to wood and flexibility.

Now the LLM can activate the KG nodes for wood and stretchability and measure the strength of the connection between the two groups of nodes. In the above case the relation will be negative or weak. The LLM could then conclude that wood is not very stretchable. The same approach can be applied to a novel application of wood that is not directly mentioned in training data, but can be deduced through paths in the knowledge graph.

Let's try the KG approach for a failed completion (wrong answer) above. For the smallest number of ducks problem, the LLM can start to activate nodes in a series.

No alt text provided for this image
Simulating the duck problem with nodes in a KG.

The LLM can activate the first node as duck 1, activate the "front" adjective node, and place two ducks behind it: activate nodes 2 & 3. Then "back" adjective would associate with nodes 2 & 3. Querying the graph according to the rest of the prompt would activate nodes 2, 3, 4, ... as "middle". For "two in front of" nodes 3, 4, 5, ... would be most active. Now with the graph in the above state, the highest activation state should be for nodes 1, 2, & 3, which is the answer. Nodes 4 and 5 and others would also have an activation state, but it should be smaller than the more talked-about, relevant nodes 1-3.

Associative logic using graphs is similar to how humans think. There are other logic modules that don't fit well into the graph / associative approach, e.g. complex math or visual analysis. Graph logic's place may be to be at the top of the abstract thinking hierarchy, guiding the overall flow of consciousness, while using LLMs, math modules, and visual models as interfaces to the real world.

No alt text provided for this image
Potential AGI structure using LLMs and KGs.

Next Steps

The next big breakthrough for LLMs is already underway from their integration with other services: external data sources, applications, and other models. New frameworks, e.g. LangChain, are being developed to ease the deployment of LLMs and their integration with other apps [6].

While academia and big tech do a lion's share of development, the open source community is also vital. Let’s work together to build open source tools to allow machine reasoning and abstract thinking. We can write Python packages for graph logic, e.g. Graph Logik, to integrate with LLMs. Digital Homer needs his full-sized brain :-)

You can reach me at https://www.dhirubhai.net/in/nazartrilisky/.

Sources

[1] https://community.openai.com/t/chatgpt-4-context-lengths/114919

[2] https://openai.com/blog/chatgpt

[3] https://arxiv.org/abs/2211.10435

[4] https://arxiv.org/abs/2201.11903

[5] https://arxiv.org/abs/2005.11401

[6] https://python.langchain.com/docs/get_started/introduction


Nazar Trilisky

Software Engineer

1 年

Multi-modal models (various inputs: audio, visual, text, ...) are already here. Also, there are models trained specifically for logic: e.g. https://www.ai21.com/, which should work as well the my knowledge graph idea. Once a solid logic NN model is built and chained with multi-modal ones for input as well as some good actor options (e.g. Google's PaLM-E, which can control a robot), we're getting close to the singularity.

回复
Nazar Trilisky

Software Engineer

1 年

https://www.langchain.com/ -> startup doing something similar. Good stuff!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了