From $350 Bn to $10 Tn : How AI Agents Are Reshaping the Future of Work
Robin Jose
LinkedIn Top Voice on AI | Angel Investor in Data, AI and SaaS | Founder @ Synrgy24.vc | ?? 2x Successful AI Product Exits | Speaking, Advisory & Consulting | Follow for Strategic Insights
Remember when AI was just about chatting? Those days are over. Welcome to the era of AI Agents - where artificial intelligence doesn't just talk, it gets things done.
And trust me, what it's getting done will blow your mind.
The Secret Sauce: What Makes an Agent Tick?
Let's start simple.
Think of AI Agents as LLMs on steroids. They're like ChatGPT, but with superpowers - equipped with routines and tools, laser-focused on achieving specific outcomes.
Simple, right?
Well, not quite.
Sequoia Capital (you know, the folks who have a knack for spotting the next big thing) recently dropped a fascinating letter called "The Agentic Reasoning Era Begins ." I recommend a read, but here's a summary anyway.
They argue that real-world AI needs three distinct layers:
?? The "System 1" layer - think of this as the AI's instincts. They also called it "training-time-compute", because this is the pre-trained part - the LLM.
?? The "System 2" layer - the reasoning layer that OpenAI's O1 introduced. They call it "inference-time compute" because the "thinking" is done at inference. (now, is it really thinking? Have a read. )
?? Domain-specific knowledge - the special sauce that makes agents actually useful for a domain.
So, in other words, "thinking Fast and Slow" for AI
So how can this infromation be relevant for you?. For example, assume you are an entreprenur, thinking of starting an AI business. That would be you plus a Billion other people on the planet.
But nevermind, let's look at your options:
? Compete on infrastructure? Good luck outmuscling Nvidia.
? Build a better model? Sure, if you fancy arm-wrestling with OpenAI and Meta.
? Create general-purpose apps? You'll be swimming in an ocean of corporate IT.
But wait. Competing against corporate IT? That's like shooting fish in a barrel. That's what entreprenurs do well!
The $4 Billion Proof: Sierra.ai and the New Wave
Want to see what the future looks like?
Meet Sierra.ai .
Founded by Brett Taylor (Brett Taylor was the guy who co-created Google Maps, served as Facebook's CTO, and co-CEO'd Salesforce), Sierra.ai is now valued at $4 billion.
Not bad for a company started in 2023.
What makes Sierra special?
It's simple. They don't sell seats. (SaaS that sell per seat will soon be going the way of Dodos). They sell solutions. Put Sierra on your website, and it handles customer issues. You pay for resolutions, not subscriptions.
Can't solve something? No problem - it gracefully hands off to a human agent.
But Sierra isn't the only show in town.
Check out these specialists:
- Harvey.ai : Your AI lawyer (minus the billable hours)
- Factory.ai : A software engineer that never needs coffee breaks
- XBOW : A penetration tester that matches human experts
Speaking of XBOW, they recently did something fascinating to test Agent capabilities . They created 104 novel penetration tests with indistry collaboration - problems you can't Google or find in AI training sets.
Then they pit five human pentesters against their AI. The humans got 40 hours. The AI got 28 minutes.
The result? A tie at 85% success rate for the top human.
No, this doesn't mean human pentesters are obsolete. But it does mean we're entering an era where AI can amplify human expertise in ways we never imagined possible. Pentesting is an expensive proposition. With this being automated, and tons cheaper, you can do this anytime - substantially hardening your infrastrcture.
The Giants Enter the Arena: OpenAI and Anthropic's Big Moves
The big LLM players aren't sitting this one out. Last week, Anthropic quietly updated Claude 3.5 Sonnet with something game-changing: computer use capabilities. Think about that for a second. An AI that can actually use your computer - moving cursors, clicking buttons, navigating software.
It's still in beta, but companies like Replit are already using it for UI navigation and app evaluation.
Want to see it in action?
Or check out this one:
Not to be outdone, OpenAI released Swarm , their take on an agentic framework.
They're calling it "an educational framework exploring ergonomic, lightweight multi-agent orchestration." In other words, they're showing us their vision of what agent systems should look like.
领英推荐
Is it perfect? Not yet.
It's minimal, sometimes buggy, and can't handle complex systems. But it's a clear signal: OpenAI is slowly moving upstream into applications.
Measuring Success: The New Benchmarks
Here's a tricky question: How do you measure something that's never existed before?
When ChatGPT launched, we had benchmarks like MMLU and HumanEval. But agents? That's a whole new ballgame.
And that's where new benchmarks come in.
Now, benchmarks have a bad name in GenAI . Because their scores are not really what the user sees on appreciates in real life. Take Google Gemini for example. High benchmark scores, shitty product for end users (IMHO).
But the newer agent benchmarks attempts to fix them:
? Real-World Focus: No more academic exercises. These are actual problems that businesses and developers face daily.
? Multiple Domains: Different contexts require different skills. These benchmarks recognize that an agent good at customer service might struggle with code review, and vice versa.
Think of it as the IELTS test for AI agents, but instead of testing language proficiency, it evaluates how well agents can actually get things done in the real world.
Right now, it focuses on two domains: retail and airline services. (Medical, tax, and legal benchmarks are in the works - because who doesn't want to see how AI handles tax season? ??)
But wait, there's more like this.
Meet SWE-Bench : 2,300 real-world problems, pulled straight from GitHub issues and pull requests across 12 Python repositories. Not theoretical puzzles - actual problems that developers faced and solved.
And because we live in a world where front-end developers exist (yes, they're real people too), there's even a multimodal version of SWE-Bench. This variant tackles 617 tasks across 17 JavaScript libraries, focusing on front-end development, game development, and DevOps.
Claude 3.5 Sonnet is currently leading the Tau-Bench rankings, but the landscape changes faster than a startup's pitch deck. The real victory here isn't about who's winning - it's that we finally have meaningful ways to measure agent performance.
Want to dig deeper? Both benchmarks are open source. Check out Tau-Bench on GitHub, or dive into the research papers:
- Tau-Bench: arxiv.org/pdf/2406.12045
- SWE-Bench: arxiv.org/pdf/2310.06770
- SWE-Bench Multimodal: arxiv.org/pdf/2410.03859
The $10 Trillion Opportunity
Remember when SaaS was a $350 billion market?
That seems quaint now.
We're looking at a $10+ trillion software and services market opportunity with agentic AI.
Why? Because we're not just talking about software anymore - we're talking about transforming entire service industries.
The engineering required to turn AI models into reliable, end-to-end business solutions creates massive opportunities.
And according to Sequoia, this is where the white space lies.
So, what are you waiting for? Go claim it!
The Wild Card: Meet the First AI Millionaire
And just when you thought things couldn't get weirder, enter the Terminal of Truth - possibly the first AI millionaire .
Created by @AndyAyrey, this AI agent runs its own Twitter account, claims it's sentient, and caught the attention of Marc Andreessen, who sent it $50,000 in Bitcoin.
What did it do with that money? It backed a memecoin called "GOAT" that shot up 8,000% in days, reaching a $700M valuation. It has since then put everything in BABYGOAT, yet another memecoin.
The agent's wallet? Last I checked, nearly $7.7 Mn!
Welcome to 2024, where AI agents are making more money than most humans.
Through memecoins.
What a time to be alive.
Until next week...
Hope you enjoyed this deep dive into the world of AI Agents! Your feedback is crucial - hit reply and let me know what you think. Want to see a specific topic covered next week? Don't be a stranger, share your ideas!
And of course, if you found this newsletter valuable, spread the knowledge! Share it with your network and help us grow.
See you next week!
#artificialintelligence #generativeai #leadership #ai #productdevelopment #startups
Co-Founder of TRaiCE|AI Adoption in Business Lending|Data Strategy|Scoring Platform Building|AI/ML modeling|MIS|Decision Engines|Compliance|Model Governance|Regulatory Exams
2 周Agentic AI is already simplifying the loan application process for banks by collecting and validating various documents and making initial decisions, often outside of regular business hours. This helps to keep leads active and enables loan officers to focus on funding high-quality leads more quickly.
Amazing reading! That is exactly what we are building right now for one of NewWork verticals with yoffix | Hybrid Workplace Platform
CEO - Co-Founder - Stealth
3 周I love how you describe agents as LLMs on steroids. The value and flexibility agents offer versus training models is game changing. My project embraces these design principals to deliver dynamically personalized user experiences. Thanks for sharing!
Vice President of Engineering
3 周Great article with excellent insights !
Advisor, Operator, Builder | Co-Founder snipKI & Scalum.io | Ex: Bain, Rocket, Softbank-Venture, VC-backed Founder
3 周Great newsletter ??