登录查看更多内容

A new architecture that incorporates more human-like memory features

Arun Krishnan

Entrepreneur, Technology Leader, Business Leader, Analytics, AI, GenAI, Author Experienced Data Science and AI professional and leader. Driving Data Science, AI and GenAI technology and business growth

发布日期: 2025年1月28日

The one huge drawback of attention models that are ubiquitous in LLMs, is the fact that the memory requirements can quickly go through the roof. The cost is quadratic, limiting context window sizes.

Modern architectures, even the much touted transformers, work mostly as short-term memories, given the limitations on the size of the context lengths.

That is, the pair of key and value matrices acts as the model’s memory, and the model: (1) updates the memory by appending the key and value to the memory (without compression), and (2) retrieves query vectors’ corresponding memory by finding the similarity of query and key vectors, which is then used to weight the value vectors for the output

Most new architectures try and mimic the human brain. One of the main advantages of the human brain is that its memory is

"a confederation of systems–e.g., short-term, working, and long-term memory–each serving a different function with different neural structures, and each capable of operating independently."

Given this, this paper tries to answer the following questions:

What constitute a good structure for the memory?
What is a proper memory update mechanism?
What is a good memory retrieval process?
How to design an efficient architecture that incorporates different interconnected memory modules.
Is a deep memory module needed to effectively store/remember long past?

The authors introduce the concept of Surprise which quantifies how different new data is from the past data. The larger the difference, the greater the gradient and, in a sense, the more "memorable" and vivid the surprise leading to a deeper association in the memory.

Past surprises can tend to decay over time using the parameter nt.

领英推荐

A Deep Dive into Custom FFT Implementation in C#: From…

David Shergilashvili 1 个月前

Non-GEO Constellations Analysis Toolkit 5.0 (NCAT5)

Carlos Placido 5 个月前

Modelling Hyper-Edges

Kurt Cagle 1 年前

The memory architecture, is thus split up into persistent long term memory, a core memory and more transactional, contextual memory, thereby mimicking the human memory system better than current models.

The performance of the model across a range of activities is very encouraging.

Things are moving very very fast in this space and those of us in this area, have to keep updating ourselves quickly on the changes happening.

AGI might not be far off now!

Venkataraghavan Srinivasan

LM’s & Language Engineering

1 个月

I think its because of two stage RLHF (the second reasoning!) i am guessing its not just thumbs up thumbs down feedback but something like a preference data set. That said its kick ass (Although last couple of days due to the madness to able to hit their servers!) and token is like so cheap. We will need to wait on more data sets that evaulate their models performance a bit more critically but as of now ensoy thangamani PS: Methinks question of time before some dikat like no model distillation should be allowed kind of discussion begins!

1 次回应

Dhanushika Sakthi V B

1 个月

Very informative and an insightful article about human-like memory features incorporated architecture

查看更多评论

要查看或添加评论，请登录

Arun Krishnan的更多文章

What's Deep about DeepSeek?

2025年1月27日

What's Deep about DeepSeek?

Deepseek has taken the LLM world by storm, achieving parity with the latest models from OpenAI at a fraction of the…

16 条评论
BertViz - Visualizing Attention in Transformers

2024年6月25日

BertViz - Visualizing Attention in Transformers

With the increasing use of LLMs and Transformers in organisations, users are starting to demand explainability from…
Buffer-of-Thought Prompting

2024年6月20日

Buffer-of-Thought Prompting

With use cases becoming more and more complicated and agent-based systems becoming the norm for #GenerativeAI based…

1 条评论
To Embed or not to Embed ...

2023年12月12日

To Embed or not to Embed ...

Everyone by now, ought to be familiar with the Retrieval-Augmented Generation (RAG) approach, wherein documents or text…
The GenAI conundrum

2023年11月30日

The GenAI conundrum

So you are the CEO of a company and have heard of this wonderful new toy called Generative AI. You call a meeting of…

9 条评论
Understanding the craft of writing

2023年6月15日

Understanding the craft of writing

I have never written an article about writing. Even though I have published my first novel and three more are already…
Generating Images with Large Language Model (GILL)

2023年6月13日

Generating Images with Large Language Model (GILL)

By now, we all know that LLMs work by creating embeddings of sentences in a large, multi-dimensional textual space…

2 条评论
Are neural networks actually starting to replicate the functioning of the human brain?

2023年5月25日

Are neural networks actually starting to replicate the functioning of the human brain?

Artificial Neural Networks (ANNs), as the name suggests were patterned after the way we thought the human brain worked.…

2 条评论
Claude and "Constitutional" AI

2023年5月23日

Claude and "Constitutional" AI

For a while now, I have been of the firm opinion that we need to build in Asimov's Three Laws of Robotics into our AI…
All about Chain-of-Thought (CoT)Prompting

2023年5月15日

All about Chain-of-Thought (CoT)Prompting

The rapidity with which LLM models have been progressing has been nothing short of stunning. The last few months have…

5 条评论

See all articles

A new architecture that incorporates more human-like memory features

Arun Krishnan

Entrepreneur, Technology Leader, Business Leader, Analytics, AI, GenAI, Author Experienced Data Science and AI professional and leader. Driving Data Science, AI and GenAI technology and business growth

领英推荐

Arun Krishnan的更多文章

社区洞察

其他会员也浏览了

[Intelligent GRID Architecture, for Societal Goals][IaGridSoc] Part2

Setting off fusion in Kubernetes!

Paper Review: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Transformer and Hallucinations

IEEE 754 FLOATING POINT VEDIC MULTIPLIER-HDL Implementation

MICRO 2023: Artifact evaluation report for the 56th IEEE/ACM International Symposium on Microarchitecture

Short Tutorial on How to Setup a Smart DAQ to upload LIVE Experimental Data to a DOI Data Repository

Transformer Architecture: Simplified (sort of).

YOLOv8 VS YOLOv9 VS ResNet (Residual Networks) VS VGG (Visual Geometry Group Network) VS Inception (GoogleNet)

领英推荐

Arun Krishnan的更多文章

What's Deep about DeepSeek?

BertViz - Visualizing Attention in Transformers

Buffer-of-Thought Prompting

To Embed or not to Embed ...

The GenAI conundrum

Understanding the craft of writing

Generating Images with Large Language Model (GILL)

Are neural networks actually starting to replicate the functioning of the human brain?

Claude and "Constitutional" AI

All about Chain-of-Thought (CoT)Prompting

社区洞察

其他会员也浏览了

[Intelligent GRID Architecture, for Societal Goals][IaGridSoc] Part2

Setting off fusion in Kubernetes!

Paper Review: Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Transformer and Hallucinations

IEEE 754 FLOATING POINT VEDIC MULTIPLIER-HDL Implementation

MICRO 2023: Artifact evaluation report for the 56th IEEE/ACM International Symposium on Microarchitecture

Short Tutorial on How to Setup a Smart DAQ to upload LIVE Experimental Data to a DOI Data Repository

Transformer Architecture: Simplified (sort of).

YOLOv8 VS YOLOv9 VS ResNet (Residual Networks) VS VGG (Visual Geometry Group Network) VS Inception (GoogleNet)