A new architecture that incorporates more human-like memory features
Arun Krishnan
Entrepreneur, Technology Leader, Business Leader, Analytics, AI, GenAI, Author Experienced Data Science and AI professional and leader. Driving Data Science, AI and GenAI technology and business growth
The one huge drawback of attention models that are ubiquitous in LLMs, is the fact that the memory requirements can quickly go through the roof. The cost is quadratic, limiting context window sizes.
Modern architectures, even the much touted transformers, work mostly as short-term memories, given the limitations on the size of the context lengths.
That is, the pair of key and value matrices acts as the model’s memory, and the model: (1) updates the memory by appending the key and value to the memory (without compression), and (2) retrieves query vectors’ corresponding memory by finding the similarity of query and key vectors, which is then used to weight the value vectors for the output
Most new architectures try and mimic the human brain. One of the main advantages of the human brain is that its memory is
"a confederation of systems–e.g., short-term, working, and long-term memory–each serving a different function with different neural structures, and each capable of operating independently."
Given this, this paper tries to answer the following questions:
The authors introduce the concept of Surprise which quantifies how different new data is from the past data. The larger the difference, the greater the gradient and, in a sense, the more "memorable" and vivid the surprise leading to a deeper association in the memory.
Past surprises can tend to decay over time using the parameter nt.
领英推荐
The memory architecture, is thus split up into persistent long term memory, a core memory and more transactional, contextual memory, thereby mimicking the human memory system better than current models.
The performance of the model across a range of activities is very encouraging.
Things are moving very very fast in this space and those of us in this area, have to keep updating ourselves quickly on the changes happening.
AGI might not be far off now!
LM’s & Language Engineering
1 个月I think its because of two stage RLHF (the second reasoning!) i am guessing its not just thumbs up thumbs down feedback but something like a preference data set. That said its kick ass (Although last couple of days due to the madness to able to hit their servers!) and token is like so cheap. We will need to wait on more data sets that evaulate their models performance a bit more critically but as of now ensoy thangamani PS: Methinks question of time before some dikat like no model distillation should be allowed kind of discussion begins!
Microsoft Certified Fabric Analytics Engineer | Microsoft Certified Power BI Data Analyst | Data & Business Analytics | Software Engineer Trainee | iLink Digital | Microsoft Fabric | Power BI | SQL | Python
1 个月Very informative and an insightful article about human-like memory features incorporated architecture