Mega, Mamba, Liquid...
I had the opportunity to attend and contribute to an outstanding event on AI and business held at the MIT Media Lab on April 18th. Our host John Werner is amazingly energetic, dynamic, and intelligent and pulled off an event for thousands of people with a small handful of hardworking volunteers.
In his commentary before, during, and after the event John was quite transparent in his motivations: that Boston and MIT in particular should be front and center as the place where AI innovation will happen over the next few years and that he hoped by bringing the community together at the Media Lab a few connections and relationships would blossom. As one person at the conference put it "...let's pull some of those MIT grads off the train to Silicon Valley and keep them here!"
While there were a huge number (100+) of impressive local startups at the event, exhibit "A" for John's argument that AI innovation is happening in Boston was the company Liquid. He even gave them a main stage panel discussion to educate the audience about the interesting approach that they are bringing to challenge the current dominance of transformer based foundation models. Considered a "spinoff" from MIT, they are largely local to Boston although they also have a Palo Alto, CA presence. The research work that inspired the company approached the same questions that inspired the developers of the transformer (GPT) architecture but with a different starting point.
“Liquid’s approach to advancing the state-of-the-art in AI is grounded in the integration of fundamental truths across biology, physics, neuroscience, math, and computer science. We believe that trans-disciplinary approaches will unlock the greatest levels of acceleration towards the most efficient breakthroughs.”– Joseph Jacks (Founder and General Partner at OSS Capital)
What is that approach? Well it started with the brain of a roundworm C. elegans which has just 302 neurons and 7000 connections (vs. 100 billion neurons in the human brain). The goal is to create smaller, simpler, but nonetheless powerful foundation models which can perform important tasks while using much less computing and power.
领英推荐
The bigger story here beyond Liquid is that there are a growing number of different research initiatives all experimenting with alternatives to transformer algorithms. Another example is state space models which originally were developed to address problems in signal processing. Mamba, developed from research at Carnegie Mellon and Princeton, is a linear-time sequence model. It scales better than transformers for longer sequences and can extrapolate to sequences much longer than it was trained on. The core of this approach is in summarizing past information into a current state and then using that representation to drive predictions.
Meanwhile the transformer architectures aren't standing still with a variety of larger models, sometimes referred to as Mega models, which are trained on huge data sets with the goal of providing an underlying "utility" layer which is good at all but the most specialized tasks.
In the long run we should expect to see application architectures emerge which utilize different approaches to solve different parts of a problem. You may have a pre-processing algorithm which provides the "front-end" interface which can maintain consistency, be managed for safety, and presents the right tone and personality for a given activity. Behind this you may have a panel of models which interact with one another to answer a given request, including specific tools focused on deterministic tasks like calculation, code execution, or integration with other information systems.
As the Berkeley AI Research lab recently explained, even the common platforms that we are using today like ChatGPT are actually compound AI systems incorporating many components. ChatGPT might be characterized as an LLM plus a web browser plugin for retrieving timely content, a code Interpreter plugin for executing Python, and the DALL-E image generator. Another example BAIR provides -- Google DeepMind's AlphaGeometry is a combination of a fine-tuned LLM and a symbolic math engine.
One thing is certain -- every advance which solves one set of questions simply unearths a new set of questions. And there is an enormous amount of both human and machine intelligence being applied to these questions resulting in rapid (dare I say exponential?) advances in the capabilities of our machine intelligence inventions. Have a problem that you don't think can be solved by today's AI? Definitely take a look again tomorrow.
? Creating value with generative AI agents (Stealth) ?
4 个月It is gotten down to hours. (e.g. Llama3 Hackathon)
Founder at Hal9
5 个月Great write up Ted!
I help companies resuscitate dead leads and sell using AI ?????????????? #copywriting #emailmarketing #coldemail #content #databasereactivation
5 个月Exciting times ahead with all these AI advancements.
Crafting Audits, Process and Automations that Generate ?+??| Work remotely Only | Founder & Tech Creative | 30+ Companies Guided
5 个月Thought-provoking insight on AI's rapid evolution. Question everything, stay curious.
Leadership and Ethics in AI - Coach, Consultant and Facilitator
5 个月Bro’ said it here: “Liquid’s approach to advancing the state-of-the-art in AI is grounded in the integration of fundamental truths across biology, physics, neuroscience, math, and computer science. We believe that trans-disciplinary approaches will unlock the greatest levels of acceleration towards the most efficient breakthroughs.”– Joseph Jacks (Founder and General Partner at OSS Capital) Directing AI/Ml to; 1) find isomorphic structures, patterns and hierarchies across seemingly disparate domains. 2) find Intersective synthesis between domains 3) uncover unknown unknowns within domain beyond current human cognition and perception.