ByteDance’s “Self-Controlled Memory system” can access a data bank of hundreds of turns of dialogue and thousands of characters, giving language models way better capabilities than ChatGPT.
That means when you type something into the prompt, it won’t just give you a response based on what you typed, but also everything else you’ve typed before – sorta like a memory.
But researchers at multiple institutions are trying to give generative AI more organized memory to enhance what it produces.
A paper published this month by researcher Weizhi Wang from University of California at Santa Barbara and collaborators from Microsoft, titled “Augmenting Language Models with Long-Term Memory” and posted on the arXiv pre-print server, adds a new component to language models.
The problem is ChatGPT and similar programs can’t take in enough text in one go to have a very long context for things.
OpenAI’s GPT-3 takes a max input of 2,000 tokens (characters or words). You can’t feed it a 5,000-word article or 70,000-word novel.
Expanding the input window creates a computing problem since the attention operation (used by ChatGPT and GPT-4) has “quadratic” computational complexity.
That means it takes more time for ChatGPT to produce an answer as the amount of data increases – and the compute needed balloons.
So scholars have tried to create a crude memory. Yuhuai Wu and colleagues at Google last year introduced what they call the Memorizing Transformer, which stores past answers and operates on up to 65,000 tokens at once.
But Wang and team warn that data can become ‘stale’ since training the Memory Transformer causes some info in memory to be out of sync with the neural network’s parameters.
They use tasks based on three datasets that involve summarizing very long texts, including whole articles and textbooks: Project Gutenberg, the arXiv file server, and ChapterBreak.
To give you an idea of the scale of those tasks, ChapterBreak, introduced last year by Simeng Sun and colleagues at the University of Massachusetts Amherst, takes whole books and tests a language model to see if, given one chapter as input, it can accurately identify from several candidate passages which one is the start of the next chapter.
Such a task “requires a rich understanding of long-range dependencies”, such as changes in place and time of events, and techniques including “analepsis”, where, “the next chapter is a ‘flashback’ to an earlier point in the narrative.”
And it involves processing tens or even hundreds of thousands of tokens.
They’ve got three datasets where they require you to summarize REALLY long texts – Project Gutenberg, the arXiv file server and ChapterBreak.
To give you an idea of the scale, ChapterBreak tests a language model to see if it can correctly identify which passage is the start of the next chapter, when given one chapter as input.
This task needs a deep understanding of long-range dependencies, like changes in time and place, and techniques like ‘analepsis’ (where there’s a flashback to an earlier part of the story). It’s all about processing tens or even hundreds of thousands of tokens.
Microsoft’s work is echoing ByteDance’s recent research – the parent of TikTok. In April, Xinnian Liang of ByteDance and colleagues posted a paper on arXiv, called “Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System”.
The program they developed gives any large language model the ability to store long sequences of stuff, which can improve its ability to respond appropriately to prompts – even better than ChatGPT. They call it the ‘Self-Controlled Memory system’, or SCM.
Whenever a user types something at the prompt, the memory controller evaluates it to see if it needs access to the archival memory stream containing all past interactions between them and the program. It’s kinda like Wang and team’s SideNet plus memory bank.
If needed, Pinecone (or a similar vector database tool) is used to match the user query against what’s in the database.
Some queries don’t need memory, like “Tell me a joke”. But if you ask something like “Do you remember the conclusion we made last week on the fitness diets?” it needs access to past chat material.
The cool thing is that they combined the user prompt and memory into what they call “input fusion”, which then becomes the input for the language model.
Liang and team hooked up their SCM with GPT-3 (text-davinci-003) and tested how it compared to ChatGPT with the same input – and it topped ChatGPT in tasks that involve referring back to conversations way earlier!
In one series of more than 100 turns, consisting of 4,000 tokens, when the human prompts the machine to recall the hobbies of the person discussed at the outset of the session, “the SCM system provides an accurate response to the query, demonstrating exceptional memory-enhanced capabilities,” they write, while, “in contrast, it appears that ChatGPT was distracted by a considerable amount of irrelevant historical data.”
The SCM can summarize thousands of words in long texts like reports, by storing the first summary in a memory stream and combining it with the next one.
It can also make large language models that aren’t chat bots act like them. The results show that their system enables LLMs (that aren’t optimized for multi-turn dialogue) to achieve multi-turn dialogue capabilities that are on par with ChatGPT.
Microsoft and TikTok’s work can be seen as building on the original intention of language models. Before ChatGPT and its predecessor, Google’s Transformer, natural language tasks were usually done with recurrent neural networks (RNNs).
RNNs compare current input to earlier input data. But the Transformer and LLMs like ChatGPT replaced RNNs with a simpler approach – attention. Attention looks at everything you type compared to what you’ve typed before so the past is always brought in.
So Microsoft and TikTok’s research merely extends attention with algorithms that are specifically designed to recall elements from the past in a more structured way.
Adding memory is such a basic change that it’ll probably become standard for large language models, making it more common for programs to make connections to past material like chat history or address long texts.