what is memGPT?
michael raspuzzi
building worldwide studios | no code ai agents sprint (march 24-april 4)
making large language models better with a layer of virtual memory management
computer operating systems have memory management enabling new functions beyond what the physical memory can do. there is a virtual layer that enables multitasking, caching, and protection against malicious applications. without the operating system managing memory, computer processing would be severely limited.
right now large language models (LLMs) have that limitation. they have limited context windows, where there’s only so much input for so much output, as well as short term memory loss. talking to chatGPT is like talking to dory from finding nemo. every conversation you have to remind it what it’s role is and it’s goal. to keep swimming, it needs to know where you are swimming towards.
LLMs have the ability to be programmed to maintain certain roles and functions throughout a series of prompts and tasks. this is what’s enabled a new virtual layer for memory management taking the first big step to making LLMs like an operating system enabling new functions.
memGPT gives LLMs memory management
a team of researchers at berkeley created memGPT, which acts as memory management for large language models. this enables long term memory retrieval and writing ability as well as bypasses the context window input limit.
memGPT augments LLMs with a hierarchical memory system and functions that let it manage its own memory. the LLM processes main context (like RAM main memory in an OS) as an input, and output text is parsed with either a yield or a function.
these functions enable memGPT to move data between main context and external context (like OS disk memory). when the LLM processor generates a function call, it can chain together a series functions, like searching a database and sending a message.
this process enables an ability for long term memory storage as well as an ability to figure things out, like rewriting it’s own memory when it gets corrceted by a user message or data in a document.
how memGPT works (going left to right from the above diagram)
testing memGPT in analyzing documents and long form chat
the team tested memGPT on two main use cases: 1) document analysis and 2) long form chat conversations.
for analyzing documents, previous llms have token limits for how much can be processed in one function, which limits the kind of documents that can be processed. for example, open ai’s gpt-4 has an 8192 token limit. stephen king’s best selling novel, the shining, has around 150,000 words, which approximates to about 200,000 tokens. it would take 25 context windows (or prompts) to feed gpt-4 stephen king’s novel.
using the main context to feed a single document can limit performance when scaled up.
memGPT is consistent at analyzing documents regardless of size, while gpt-4 decreases
in the first use case, they were able to show that memGPT performed consistently well in accuracy regardless of context length, or how much text information was used in query.
and what is more interesting, is the ability for memGPT to do nested key value task. see below.
in this example, memGPT continously searches archival memory until it finds the latest key. once the archival memory reveals that the current key value is not a key, then it starts the search again to find its pair. once it finds the final value, it returns the message to the user. alongside search, it also can self edit its memory.
memGPT enables self correcting memory and long term retrieval
notice in the above conversation, the chat bot made a mistake of saying ‘hi chad,’ and the user corrected it to say their name is ‘brad.’ highlighted in red, memGPT is able to edit it’s memory of brad’s first name to both reply back instaneously and remember it long term.
this makes the conversation more natural as well as more useful.
领英推荐
ideas to apply memGPT
memGPT enables LLMs to have the ability to read long documents, search different archived datasets, and remember a user over a long history of chat.
understanding what new functionality is unlocked with LLMs:
these three things unlock a new generation of smart assistants and co-pilots that help create a more robust and richer user experience.
alongside the obvious of a better customer support chat bot referencing a company’s wiki and ticket log, some other examples for using memGPT include:
what i’m most interested in is the last use case: how can a large generative model be made for specific use cases in educating and supporting better learning environments?
rather than the limits of one text book or one tutor’s knowledge graph of a subject like physics, what if you could have access to a knowledge graph that’s trained on the top 100 textbooks throughout time, able to personalize conversation because it knows the student’s journey, and it can customize curriculum for them?
while LLMs by themselves are good general tutors, memGPT could help super power that based on longer term memory for better reference and chat experience.
while one point against this is that it takes away the human role of either writing text books or teaching, i actually think it has the potential to have the opposite effect:
ultimately, this becomes a new kind of textbook resource for school. it’s a compression of knowledge that students can interact with. the main difference: they can really interact with it. ideally this unlocks more learning, less schooling.
it’s crazy that in 2023 while schooling in synonomous with ‘learning,’ learning based outcomes are not guaranteed with more schooling.
smart tutors with memGPT will help bridge this gap in accessibility and quality.
memGPT while impressive in the moment may be easily forgotten
this seems similar to the time right after the iphone was released. there was a community that would jail break the hardware to enable new functionality based on what they wanted to use it for v. what apple thought users might want.
for example, the iphone 4 was the first to have a camera flash in june 2010. within a few weeks there was a jailbroken app to use that flash as a standalone flashlight by sept 2010. it took apple 3 years for iOS 7 to be released in 2013 to have built in flashlight controls…
memGPT seems to be that jailbroken app, enabling longer term memory, which is a short term problem that may eventually be updated.
with open ai releasing gpt less than a year ago, we have not really seen any product features or releases yet. it’s been more so integrated into other products like github co-pilot or notion’s assistance.
overally, the first year seems to be one large large human in the loop learning reinforcement learning environment to see how humans will interface with ai as well as to fine tune the output.
it will be interesting to see how long before open ai shares core memory management features as they explore new physical interfaces (iphone without the phone?) as well as what they share at their first developer conference in november.
hey, i’m michael and i’m exploring the intersection of ai and alt education. the next generation of innovaters deserve next generation tooling. connect with me on linkedin to follow my build journey.