登录查看更多内容

2024 Outlook for Language Models

Ubaid Dhiyan

Advisor to Builders | Infrastructure Software | Engineer, Banker, Dad

发布日期: 2023年12月18日

In my 2023 Outlook (see pg. 21 of this report ), I predicted that Generative AI would take center stage in venture capital interest following the scandals and general implosion of crypto markets in 2022. Indeed, over the past twelve months, this prediction has held true, despite challenges such as reduced capital availability, a decline in deal-making activity, and a complex backdrop of rising interest rates, bank failures, and geopolitical tensions. While the Fed’s latest meeting provided cheer to the markets, I don’t believe we are completely out of the woods yet.?

In today’s post I’m going to talk through some areas where I expect to see meaningful traction and investment activity in the next six to twelve months. My focus is broadly around generative AI, but more specifically on Large Language Models (LLMs). I will cover five trends gaining traction, point at interesting players within each, and why as an investor, customer, or potential employee, you should pay attention to them. In a follow up to this post, I will talk about value creation and value capture around generative AI. Click here to subscribe to InfraRead so you don’t miss future updates.

Emergence of Small Language Models (SLM)? and Vertical-specific LLMs (VsLLM)

It has been conventional wisdom that bigger is better in the world of LLMs. From GPT-4 to Llama 2 to Mistral 8x7B, it is entirely expected that the most ambitious AI leaders will continue beefing up their flagship models with ever more parameters. Consider the evolutionary tree of LLMs as laid out in the April 2023 paper Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond -

The evolutionary tree of modern LLMs. Source "Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond" available on arXiv

Despite the parameters-focused arms race in this initial phase, I believe, the more compelling and capable models over time are likely to emerge from focused data sets that are use-case-, language- or geography-specific (or some combination of these). These smaller models don’t have to be independent of the general-purpose LLMs. I fully expect a large number of the SLMs and VsLLMs to be built on the work that has already gone into existing, general-purpose models.

Some examples from the recent past and on the horizon are -

BloombergGPT, a finance focused model trained on Bloomberg’s formidable finance-specific data resources
Noetica, a New York based firm that just got funded by Lightspeed and has built a platform for evaluating capital market deal terms during negotiation
Krutrim, and Sarvam, both focused on building LLMs for Indic languages?
Flip AI, which has trained its LLM on operational data from complex infrastructure environments and claims to supercharge observability workflows with its proprietary technology
Dozens of derivative, task-specific models built on top of open-source models like Llama and Mistral. Here are examples for coding, math, multi-modal tasking, reasoning, chatbots and for running on entry-level hardware

Multi-modality, Context-Awareness, and Longer Outputs: Going Beyond Chat

My initial excitement on viewing Google's Gemini launch video was tempered on learning that it had been edited to look smoother than it actually was. Despite the revelation, I think the video paints a picture of how multi-modality in LLMs turbocharges human-computer interaction. I expect multi-modality (audio and visual processing for instruction and inference) to be table stakes for all existing and future large language models that want to be relevant for general-purpose use cases.

The "very" cool Gemini Launch video

This does not mean specialized models for specific use cases like spoken language, music, image, and video generation will cease to be relevant. All of those will continue improving in parallel, at least in the near-future.

ChatGPT’s success has anchored chat as the primary use case for leveraging LLMs. I think that’s will change fast. Chat is quite limited when seen from the context of enterprise. Most modern enterprise workflows are complex interactions that require large context windows, multiple threads, and the ability to generate coherent and consistent long-form, multi-modal content. Today's crop of large language models were not designed for most enterprise use cases except, perhaps customer service and programming. Rapidly dating knowledge snapshots are a limitation as well.

Recent approaches to overcome these limitations include Retrieval-Augmented Generation (RAG) coupled with frameworks like LangChain, LlamaIndex and AutoGen, but have a long way to go. OpenAI’s DevDay announcements were specifically targeted at many of these challenges and set the stage for competitors like Anthropic, Cohere, Falcon, Gemini, Llama and Mistral to follow and catch up. ?

Evolving Architectures: Beyond Bigger, Towards Smarter

If you frequent the AI and Machine Learning corners of Twitter, GitHub YouTube, or Hugging Face, you're likely familiar with arXiv. arXiv serves as an archive of ground-breaking research in several technical fields, with one of the most prominent areas being computer science. The pace of new ideas being posted on arXiv, and then implemented into working prototypes around all axes of generative AI is simply breathtaking.

Fueled by capital and compute resources from VCs, research labs, and large-tech, researchers are pushing the boundaries of LLM capabilities while also reducing compute overhead. Efficient and sustainable inference times are a prime target, and recent advancements reveal remarkable strides in running models with significantly fewer resources. This ties closely to the trend of domain-specific models mentioned previously. Some recent examples highlighting this trend are -?

Mistral AI's Mixtral 8x7B, a powerful large language model with a unique architecture known as a sparse mixture-of-experts network. The model has 8 distinct sets of parameters, each acting as an "expert" in different aspects of language processing. While having a large total parameter count (46.7B), Mixtral only uses a smaller portion (12.9B) per token thanks to expert selection. This means that processing a token takes the same time and resources as a smaller 12.9B model, despite the larger overall parameter size.
Microsoft's Phi-2 leverages specially curated 'textbook quality data' to excel on complex benchmarks, outperforming even models with up to 25 times the number of parameters. Its performance is further boosted by building upon the work done with its predecessor, Phi-1.5.

A visual representation of quantization. Source - Fitting AI models in your pocket with quantization on Stack Overflow

Quantization is an accessible technique for reducing model size, and therefore memory requirements and inference times. Simply put, quantization involves using lower precision weights to shrink the amount of memory required to load and run the model. The tradeoff is lower accuracy but quantization seems to work for many use cases. Click here for a good overview of the technique and some applications.

Security, Governance and Compliance

Remember Tay? Released in the wild (i.e. Twitter) in May 2016 by Microsoft researchers, Tay was meant to be “a social and cultural experiment” that actually ended up reflecting what a cesspool Twitter could be. Unfortunately, Tay is not the only example of missteps in generative AI by large tech. Meta launched Galactica two weeks before ChatGPT made its debut, and abruptly shut it down shortly after. Galactica was intended to help scientists and other users “summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.” Doesn’t sound all that different from what ChatGPT offered shortly after, albeit to a far different reception. The dismissive tone of this MIT Technology Review article, written weeks before ChatGPT was released publicly, gives one a sense of how incredibly well OpenAI managed the launch compared to everything that came before it.?

Large Language Models are complex and the technology behind them is evolving rapidly. The fact is that we are still trying to fully understand why and how they really work. Researchers have been busy trying to hack, crack and compromise models across the board. AI researcher and educator Andrej Karpathy gives a quick overview of some techniques including prompt injection, jailbreaking, and data poisoning in his hour-long intro to LLMs (which I recommend.)

领英推荐

Almost Timely News: ??? A Semi-Technical Deep Dive…

Christopher Penn 2 个月前

Fake News Is Rampant, Here Is How Artificial…

Bernard Marr 4 年前

Almost Timely News: ??? Small Language Models and…

Christopher Penn 5 个月前

Yep, regulators are coming for the Llama. Source - Microsoft Image Creator. Sorry for the weird hands, but they are fitting I think :)

Against this checkered history, and given the world-changing potential of generative AI, it is not surprising that companies and governments are starting to draft regulations around the technology. I expect to hear a lot more on actions by bad actors and efforts to curb them in the coming year. Below are some of the key initiatives and privacy+security related startups I’m paying attention to -??

Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence issued by President Biden in late October and two summaries - E&Y, epic.org
The European Union’s Artificial Intelligence Act - good summary here; related post - John Loeber discusses why Europe needs to start creating stuff rather than freeloading American (and Asian?) ingenuity
Meta’s Purple Llama - “an umbrella project featuring open trust and safety tools and evaluations to help developers build responsibly with AI models”
LLM data security startups Lakera, Deepchecks and Deasie, all of whom have recently raised seed rounds
Privacy and compliance focused startups Harmonic Security, Citrusx and DynamoFL

Climate and Sustainability

From a recent episode of This American Life, I learned that the Paris Agreement commits the US to a 50% emissions reduction by 2030, relative to 2005. The good news is that we are already halfway there, while the remaining 25% seems achievable. Rather, it seems achievable if we do not factor in the impact of generative AI on world energy consumption.

Generative AI is extremely power-hungry. An article in the Wall Street Journal claims “global electricity consumption for AI systems could soon require adding the equivalent of a small country’s worth of power generation to our planet.” I’m not a climate scientist, but this does not sound great for the planet.

The need for additional power in the near future is not a hypothetical. Microsoft is already planning to power generative AI using nuclear power in the near future. The company is partnering with Constellation Energy, which projects “new demand for power for data centers could be five or six times what will be needed for charging electric vehicles.”

If climate change is the planet’s most existential crisis, and generative AI drives us closer to an abyss from which there is no turning back, should we pump the brakes? Or perhaps use nuclear fusion - a technology that is literally being invented as we speak? How about redesigning compute and storage from the ground up for AI-specific tasks like d-Matrix and Cerebras? Or using opitcs like Lightmatter and Ayar Labs? Alternatively, we could focus our collective energies on making quantum computing real and present.

I have a feeling the right answer is a combination of all of the above. Figuring out how to solve for the externalities of generative AI should be on top of the priority list of every deep-tech investor, from Silicon Valley to Shenzhen.

That brings me to the end of my prognostication - hope you enjoyed reading it and found some ideas to mull over. I'd love to hear your thoughts and comments. Please feel free to add them below or reach out directly.

I plan to follow up with a short review of existing business models around generative AI and how I expect them to evolve in the near future. Subscribe to the newsletter here to make sure you don't miss it.

End Notes

Matt McIlwain of Madrona has some thoughts on Gen AI in 2024. Check it out here.

Vivek Ramaswami and Sabrina Wu (also of Madrona ) have predictions for AI in 2024 as well - check them out here. Vivek and Sabrina have a great track record.

My day job is advising growing companies on fundraising and M&A. Let me know if I can be helpful to you.

InfraRead

247 位关注者

Vivek Ramaswami

Partner at Madrona

1 年

Thanks for the mention Ubaid!

2 次回应

SkyLift Marketing

1 年

Very informative Ubaid Dhiyan ??LLM on operational data from complex infrastructure environments and claims to supercharge observability workflows with its proprietary technologyDozens of derivative, task-specific models built on top of open-source models like Llama and Mistral.

查看更多评论

要查看或添加评论，请登录

Ubaid Dhiyan的更多文章

Agents, Assistants, Superintelligence, AGI, Identity

2025年3月10日

Agents, Assistants, Superintelligence, AGI, Identity

ServiceNow, Moveworks, Anysphere, Reflection AI, Turing, SpecterOps The pace of AI-driven funding and M&A continues to…
Anthropic Announces $3.5 B in New Funding

2025年3月3日

Anthropic Announces $3.5 B in New Funding

Source: Anthropic Anthropic, creator of Claude, has announced a $3.5 billion funding round at a post-money valuation of…
Endpoints, AppSec, Ransomware, Observability

2025年3月1日

Endpoints, AppSec, Ransomware, Observability

NinjaOne, Semgrep, Mimic, Arize NinjaOne Lands $500 M in Outsized Series C Austin-based endpoint management company…
IBM to Acquire DataStax

2025年2月27日

IBM to Acquire DataStax

IBM is acquiring DataStax - creator of DataStax Enterprise, the Enterprise distribution of Apache Cassandra. DataStax…
Together AI Ups the Ante With $305M Series B

2025年2月20日

Together AI Ups the Ante With $305M Series B

Source: Together AI Together AI has secured a $305 million Series B investment, led by General Catalyst and co-led by…
Baseten Raises $75M to Accelerate Inference Infrastructure

2025年2月19日

Baseten Raises $75M to Accelerate Inference Infrastructure

Source: Baseten Baseten, a San Francisco-based AI inference platform provider, has secured $75 million in Series C…
Humane is Dead, Thinking Machines is Alive

2025年2月18日

Humane is Dead, Thinking Machines is Alive

Humane, HP, Thinking Machines HP Acquires Humane AI Assets for $116M Bloomberg reports HP has acquired assets of…
Back in Action + Deal Announcement

2025年2月14日

Back in Action + Deal Announcement

Agnostiq, DataRobot, TrueFoundry, Traceable, Harness, CyberArk, Zilla Security, Sardine, Positron AI, DoiT…

1 条评论
CFIUS, Search, Thinking Fast & Slow

2024年10月14日

CFIUS, Search, Thinking Fast & Slow

Cerebras, Liner, Sequoia, Relyance AI, Braintrust Data, Voyager AI, Distributional Cerebras Pushes Roadshow Amid CFIUS…
Quirky, Liquidity, Code, Math

2024年10月4日

Quirky, Liquidity, Code, Math

Happy Friday everyone! Today's update is a bit longer than usual - there was a flurry of activity in Generative AI this…

2 条评论

See all articles

2024 Outlook for Language Models

Ubaid Dhiyan

Advisor to Builders | Infrastructure Software | Engineer, Banker, Dad

Emergence of Small Language Models (SLM)? and Vertical-specific LLMs (VsLLM)

Multi-modality, Context-Awareness, and Longer Outputs: Going Beyond Chat

Evolving Architectures: Beyond Bigger, Towards Smarter

Security, Governance and Compliance

领英推荐

Climate and Sustainability

InfraRead

247 位关注者

Ubaid Dhiyan的更多文章

社区洞察

其他会员也浏览了

Swarm Intelligence: Nature's Blueprint for Next-Generation AI + SIPS Framework

How DeepSeek's Breakthrough Mimics Human Focus

Lies, damned lies, and hallucinations

GPTNext in November 2024 and should we pull the plug?!

The Grand Duel: GPT-4 vs. Google's Gemini Ultra

Small Language Models—Scaling Down Without Losing Value

The Accuracy Problem: GPT is a Tool, Not a Source—And It Lies

Cortical Algorithms v. Large Language Models

Synthetic market research - Does it make sense?

Don't Just Choose a Right Model, Choose the Right Approach: RAG or CAG?

Emergence of Small Language Models (SLM)? and Vertical-specific LLMs (VsLLM)

Multi-modality, Context-Awareness, and Longer Outputs: Going Beyond Chat

Evolving Architectures: Beyond Bigger, Towards Smarter

Security, Governance and Compliance

领英推荐

Climate and Sustainability

InfraRead

247 位关注者

Ubaid Dhiyan的更多文章

Agents, Assistants, Superintelligence, AGI, Identity

Anthropic Announces $3.5 B in New Funding

Endpoints, AppSec, Ransomware, Observability

IBM to Acquire DataStax

Together AI Ups the Ante With $305M Series B

Baseten Raises $75M to Accelerate Inference Infrastructure

Humane is Dead, Thinking Machines is Alive

Back in Action + Deal Announcement

CFIUS, Search, Thinking Fast & Slow

Quirky, Liquidity, Code, Math

社区洞察

其他会员也浏览了

Swarm Intelligence: Nature's Blueprint for Next-Generation AI + SIPS Framework

How DeepSeek's Breakthrough Mimics Human Focus

Lies, damned lies, and hallucinations

GPTNext in November 2024 and should we pull the plug?!

The Grand Duel: GPT-4 vs. Google's Gemini Ultra

Small Language Models—Scaling Down Without Losing Value

The Accuracy Problem: GPT is a Tool, Not a Source—And It Lies

Cortical Algorithms v. Large Language Models

Synthetic market research - Does it make sense?

Don't Just Choose a Right Model, Choose the Right Approach: RAG or CAG?