A new architecture that incorporates more human-like memory features

A new architecture that incorporates more human-like memory features

The one huge drawback of attention models that are ubiquitous in LLMs, is the fact that the memory requirements can quickly go through the roof. The cost is quadratic, limiting context window sizes.

Modern architectures, even the much touted transformers, work mostly as short-term memories, given the limitations on the size of the context lengths.

That is, the pair of key and value matrices acts as the model’s memory, and the model: (1) updates the memory by appending the key and value to the memory (without compression), and (2) retrieves query vectors’ corresponding memory by finding the similarity of query and key vectors, which is then used to weight the value vectors for the output

Most new architectures try and mimic the human brain. One of the main advantages of the human brain is that its memory is

"a confederation of systems–e.g., short-term, working, and long-term memory–each serving a different function with different neural structures, and each capable of operating independently."

Given this, this paper tries to answer the following questions:

  1. What constitute a good structure for the memory?
  2. What is a proper memory update mechanism?
  3. What is a good memory retrieval process?
  4. How to design an efficient architecture that incorporates different interconnected memory modules.
  5. Is a deep memory module needed to effectively store/remember long past?

The authors introduce the concept of Surprise which quantifies how different new data is from the past data. The larger the difference, the greater the gradient and, in a sense, the more "memorable" and vivid the surprise leading to a deeper association in the memory.

Past surprises can tend to decay over time using the parameter nt.


The memory architecture, is thus split up into persistent long term memory, a core memory and more transactional, contextual memory, thereby mimicking the human memory system better than current models.

The performance of the model across a range of activities is very encouraging.


Things are moving very very fast in this space and those of us in this area, have to keep updating ourselves quickly on the changes happening.

AGI might not be far off now!

Venkataraghavan Srinivasan

LM’s & Language Engineering

1 个月

I think its because of two stage RLHF (the second reasoning!) i am guessing its not just thumbs up thumbs down feedback but something like a preference data set. That said its kick ass (Although last couple of days due to the madness to able to hit their servers!) and token is like so cheap. We will need to wait on more data sets that evaulate their models performance a bit more critically but as of now ensoy thangamani PS: Methinks question of time before some dikat like no model distillation should be allowed kind of discussion begins!

Dhanushika Sakthi V B

Microsoft Certified Fabric Analytics Engineer | Microsoft Certified Power BI Data Analyst | Data & Business Analytics | Software Engineer Trainee | iLink Digital | Microsoft Fabric | Power BI | SQL | Python

1 个月

Very informative and an insightful article about human-like memory features incorporated architecture

回复

要查看或添加评论,请登录

Arun Krishnan的更多文章

  • What's Deep about DeepSeek?

    What's Deep about DeepSeek?

    Deepseek has taken the LLM world by storm, achieving parity with the latest models from OpenAI at a fraction of the…

    16 条评论
  • BertViz - Visualizing Attention in Transformers

    BertViz - Visualizing Attention in Transformers

    With the increasing use of LLMs and Transformers in organisations, users are starting to demand explainability from…

  • Buffer-of-Thought Prompting

    Buffer-of-Thought Prompting

    With use cases becoming more and more complicated and agent-based systems becoming the norm for #GenerativeAI based…

    1 条评论
  • To Embed or not to Embed ...

    To Embed or not to Embed ...

    Everyone by now, ought to be familiar with the Retrieval-Augmented Generation (RAG) approach, wherein documents or text…

  • The GenAI conundrum

    The GenAI conundrum

    So you are the CEO of a company and have heard of this wonderful new toy called Generative AI. You call a meeting of…

    9 条评论
  • Understanding the craft of writing

    Understanding the craft of writing

    I have never written an article about writing. Even though I have published my first novel and three more are already…

  • Generating Images with Large Language Model (GILL)

    Generating Images with Large Language Model (GILL)

    By now, we all know that LLMs work by creating embeddings of sentences in a large, multi-dimensional textual space…

    2 条评论
  • Are neural networks actually starting to replicate the functioning of the human brain?

    Are neural networks actually starting to replicate the functioning of the human brain?

    Artificial Neural Networks (ANNs), as the name suggests were patterned after the way we thought the human brain worked.…

    2 条评论
  • Claude and "Constitutional" AI

    Claude and "Constitutional" AI

    For a while now, I have been of the firm opinion that we need to build in Asimov's Three Laws of Robotics into our AI…

  • All about Chain-of-Thought (CoT)Prompting

    All about Chain-of-Thought (CoT)Prompting

    The rapidity with which LLM models have been progressing has been nothing short of stunning. The last few months have…

    5 条评论

社区洞察

其他会员也浏览了