Beyond LLM
Ritesh Vajariya
Global AI Strategy Leader | Head of GenAI @ Cerebras | Founder, AI Guru | Advisor to CEOs | Ex-AWS Gen AI Leader | Board Member
Remember when we thought AI was just about chatbots and funny image generators? I'm telling you, we're on the edge of an AI explosion. It's gonna be wild.
Look, I don't have a crystal ball, but after the crazy ride we've had with AI in the last year and a half, I've got a hunch the next 18 months are going to blow our minds. We're talking AI that doesn't just chat, but sees, hears, and maybe even thinks ahead of us. Wild, right?
A quick recap of last 18-months:
Last 18-months proved that AI is not going anywhere and from now on, we need to compound the growth in enabling more and more AI adoption - where we, as human, become smarter everyday.
There are many areas which shows potential growth but in this article I want to highlight few of them:
Beyond text and image:
While text have dominated last 18 months - thanks to ChatGPT, which is known to 63% population of the world (well, my mom doesn't know it yet!) - and we are now tired of seeing AI generated images - thanks to Runway, Stable Diffusion and DALL-E; we have seen how we can use GPT-4 and Claude via their vision APIs how they can see an image and able to do visual Q&A as well as provide us textual answer. Similar capabilities are developed in the open-source ecosystem via models like LLaVA or Phi. This shows that not only proprietary models (GPT-4, Claude) can do multimodality but open-source ecosystem is catching up fast.
What GPT-4o showed us on multimodality that trickled everyone's brain - combining not only text and images but also adding voice in the mix at a low latency.
While we have seen the demo from OpenAI on this, just last week, a french AI lab, Kyutai, actually went ahead and built it and shown it to entire world: In just 6 months, with a team of 8, the Kyutai research lab developed from scratch an artificial intelligence (AI) model with unprecedented vocal capabilities called Moshi.
Imagine combining this vocal capabilities with text and images, we can apply this to multiple industries: patient healthcare, making education accessible where not everyone has to have access to best schools, never get lost in any city, and many more...
I am optimistic that I won't get lost next time I visit Beijing - well, unsure on that, as not many western tech are accessible there.. but you got the point.
Agents everywhere:
Last year, hardly a week went by without a new LLM get released. Last few weeks it's all about agents and many more startups are being created to build more agents.
Those who are not living in the AI bubble would wonder what I am talking about - so let's do agent 101 very briefly.
We have to go back to the year 2011 when Daniel Kahneman published a book, Thinking Fast and Slow. 63% of the world who have used ChatGPT, Claude or other chat based system, they know that its super fast and gives us response in few seconds or less. In this context, these systems are "thinking fast", similar to using our subconscious mind. But what if we ask these systems to take it's own time, think slow, and apply the conscious mind instead. By just adding a time element, we are able to make wonders to these systems. They are able to "think" before they answer and due to the added thinking power, they are able to do things better - still the same LLM but given time to think. It's like our human brain - when we apply our conscious mind to an activity, we can do much better job.
This is where the "agents" come in where they are adding a component of planning, keeping certain things in memory for time being, apply reasoning and then generating response - creating "better than the best"!
Those who are looking for evidence of how LLMs can act as agent and improve the work, there is an open-source project called "Agent Bench".
The benchmark evaluate LLM-as-Agent across a diverse spectrum of different environments. It encompasses 8 distinct environments to provide a more comprehensive evaluation of the LLMs' ability to operate as autonomous agents in various scenarios.
领英推荐
While LLMs begin to manifest their proficiency in LLM-as-Agent, gaps between models and the distance towards practical usability are significant. Above picture depicts that proprietary models from OpenAI and Claude are outperforming the open-source models when it comes to using LLM-as-Agent.
One must be wondering as what are some use cases agents will help us improve our lives? Well, literally all the use cases you have seen Generative AI solves for us, such as customer service, content creation, virtual tutoring for education, patient health outcomes.
Those who are hiring and want to create a job description that looks (and read) much better than standard ChatGPT output, take a look at my YouTube video describing this or head straight to the GitHub code for implementation.
How about some science?
While the adoption of AI in Sales, Finance, HR and many other departments have flourished and it's enhancing human capabilities of what we can do, how about we can apply some of these to the actual science? A drug discovery or superconductor research?
Yes, there are a lot of leapfrog on some of these area, such as when DeepMind's AlphaFold 3 can accurately predict the structure of proteins, DNA, RNA, ligands and more, and how they interact. We hope it will transform our understanding of the biological world and drug discovery. At the same time, teams at Johns Hopkins discovered new superconductor.
What if we can do similar advancement in other areas of physics and material science? There are molecular dynamics simulation being done at massive scale and we, at Cerebras, playing a significant role. But what if we can go beyond simulation and identify not only the object but a 3D shape with how that is constructed (aluminum, copper, plastic, etc.) or what's inside the object? What if we are able to detect whether there is a Coke inside the can or the water or something else?
The applications of these kind of solutions are immense in real life and while we are not there, we certainly heading in that direction.
Conclusion:
Wow, folks! If you thought the last 18 months were a wild ride in the AI world, just wait till you see what's coming next! We're talking about AI that doesn't just chat or make pretty pictures - we're entering a whole new dimension of cool.
Imagine AI that can see, hear, and talk back to you like a real person. Or how about AI agents that can actually think and plan? It's like giving AI a brain upgrade! And don't even get me started on what this could mean for science - we might be on the verge of some seriously mind-blowing discoveries.
But hey, let's not forget - with great AI power comes great responsibility. We've got to be smart about how we develop and use this stuff. It's not about replacing humans; it's about making us superhuman! We need to stay on our toes, keep our minds open, and be ready to roll with whatever AI throws our way.
So, buckle up! The next 18 months are going to be one heck of a ride in the AI world.
Shameless plug:
Did you know that Claude 3.5 is now the highest-performing LLM, even beating GPT-4o? Many people I know are thinking of canceling their ChatGPT subscriptions in favor of Claude.
But here's the thing - a lot of these folks don't know how to use Claude most effectively. That's why I created an on-demand course diving into the art and science of prompt engineering with Claude. It's perfect for everyday people who want to leverage Claude in their daily lives, covering techniques like chain-of-thought reasoning, few-shot learning, applying personas, and so much more.
If you know someone who could benefit from learning all this (and trust me, there's plenty more), why not gift them this course? It could be a game-changer for how they interact with AI!