2024 Outlook for Language Models

2024 Outlook for Language Models

In my 2023 Outlook (see pg. 21 of this report ), I predicted that Generative AI would take center stage in venture capital interest following the scandals and general implosion of crypto markets in 2022. Indeed, over the past twelve months, this prediction has held true, despite challenges such as reduced capital availability, a decline in deal-making activity, and a complex backdrop of rising interest rates, bank failures, and geopolitical tensions. While the Fed’s latest meeting provided cheer to the markets, I don’t believe we are completely out of the woods yet.?

In today’s post I’m going to talk through some areas where I expect to see meaningful traction and investment activity in the next six to twelve months. My focus is broadly around generative AI, but more specifically on Large Language Models (LLMs). I will cover five trends gaining traction, point at interesting players within each, and why as an investor, customer, or potential employee, you should pay attention to them. In a follow up to this post, I will talk about value creation and value capture around generative AI. Click here to subscribe to InfraRead so you don’t miss future updates.


Emergence of Small Language Models (SLM)? and Vertical-specific LLMs (VsLLM)

It has been conventional wisdom that bigger is better in the world of LLMs. From GPT-4 to Llama 2 to Mistral 8x7B, it is entirely expected that the most ambitious AI leaders will continue beefing up their flagship models with ever more parameters. Consider the evolutionary tree of LLMs as laid out in the April 2023 paper Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond -

The evolutionary tree of modern LLMs. Source "Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond" available on arXiv

Despite the parameters-focused arms race in this initial phase, I believe, the more compelling and capable models over time are likely to emerge from focused data sets that are use-case-, language- or geography-specific (or some combination of these). These smaller models don’t have to be independent of the general-purpose LLMs. I fully expect a large number of the SLMs and VsLLMs to be built on the work that has already gone into existing, general-purpose models.

Some examples from the recent past and on the horizon are -

  • BloombergGPT, a finance focused model trained on Bloomberg’s formidable finance-specific data resources
  • Noetica, a New York based firm that just got funded by Lightspeed and has built a platform for evaluating capital market deal terms during negotiation
  • Krutrim, and Sarvam, both focused on building LLMs for Indic languages?
  • Flip AI, which has trained its LLM on operational data from complex infrastructure environments and claims to supercharge observability workflows with its proprietary technology
  • Dozens of derivative, task-specific models built on top of open-source models like Llama and Mistral. Here are examples for coding, math, multi-modal tasking, reasoning, chatbots and for running on entry-level hardware


Multi-modality, Context-Awareness, and Longer Outputs: Going Beyond Chat

My initial excitement on viewing Google's Gemini launch video was tempered on learning that it had been edited to look smoother than it actually was. Despite the revelation, I think the video paints a picture of how multi-modality in LLMs turbocharges human-computer interaction. I expect multi-modality (audio and visual processing for instruction and inference) to be table stakes for all existing and future large language models that want to be relevant for general-purpose use cases.

The "very" cool Gemini Launch video


This does not mean specialized models for specific use cases like spoken language, music, image, and video generation will cease to be relevant. All of those will continue improving in parallel, at least in the near-future.

ChatGPT’s success has anchored chat as the primary use case for leveraging LLMs. I think that’s will change fast. Chat is quite limited when seen from the context of enterprise. Most modern enterprise workflows are complex interactions that require large context windows, multiple threads, and the ability to generate coherent and consistent long-form, multi-modal content. Today's crop of large language models were not designed for most enterprise use cases except, perhaps customer service and programming. Rapidly dating knowledge snapshots are a limitation as well.

Recent approaches to overcome these limitations include Retrieval-Augmented Generation (RAG) coupled with frameworks like LangChain, LlamaIndex and AutoGen, but have a long way to go. OpenAI’s DevDay announcements were specifically targeted at many of these challenges and set the stage for competitors like Anthropic, Cohere, Falcon, Gemini, Llama and Mistral to follow and catch up. ?


Evolving Architectures: Beyond Bigger, Towards Smarter

If you frequent the AI and Machine Learning corners of Twitter, GitHub YouTube, or Hugging Face, you're likely familiar with arXiv. arXiv serves as an archive of ground-breaking research in several technical fields, with one of the most prominent areas being computer science. The pace of new ideas being posted on arXiv, and then implemented into working prototypes around all axes of generative AI is simply breathtaking.

Fueled by capital and compute resources from VCs, research labs, and large-tech, researchers are pushing the boundaries of LLM capabilities while also reducing compute overhead. Efficient and sustainable inference times are a prime target, and recent advancements reveal remarkable strides in running models with significantly fewer resources. This ties closely to the trend of domain-specific models mentioned previously. Some recent examples highlighting this trend are -?

  • Mistral AI's Mixtral 8x7B, a powerful large language model with a unique architecture known as a sparse mixture-of-experts network. The model has 8 distinct sets of parameters, each acting as an "expert" in different aspects of language processing. While having a large total parameter count (46.7B), Mixtral only uses a smaller portion (12.9B) per token thanks to expert selection. This means that processing a token takes the same time and resources as a smaller 12.9B model, despite the larger overall parameter size.
  • Microsoft's Phi-2 leverages specially curated 'textbook quality data' to excel on complex benchmarks, outperforming even models with up to 25 times the number of parameters. Its performance is further boosted by building upon the work done with its predecessor, Phi-1.5.

A visual representation of quantization. Source - Fitting AI models in your pocket with quantization on Stack Overflow

  • Quantization is an accessible technique for reducing model size, and therefore memory requirements and inference times. Simply put, quantization involves using lower precision weights to shrink the amount of memory required to load and run the model. The tradeoff is lower accuracy but quantization seems to work for many use cases. Click here for a good overview of the technique and some applications.


Security, Governance and Compliance

Remember Tay? Released in the wild (i.e. Twitter) in May 2016 by Microsoft researchers, Tay was meant to be “a social and cultural experiment” that actually ended up reflecting what a cesspool Twitter could be. Unfortunately, Tay is not the only example of missteps in generative AI by large tech. Meta launched Galactica two weeks before ChatGPT made its debut, and abruptly shut it down shortly after. Galactica was intended to help scientists and other users “summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.” Doesn’t sound all that different from what ChatGPT offered shortly after, albeit to a far different reception. The dismissive tone of this MIT Technology Review article, written weeks before ChatGPT was released publicly, gives one a sense of how incredibly well OpenAI managed the launch compared to everything that came before it.?

Large Language Models are complex and the technology behind them is evolving rapidly. The fact is that we are still trying to fully understand why and how they really work. Researchers have been busy trying to hack, crack and compromise models across the board. AI researcher and educator Andrej Karpathy gives a quick overview of some techniques including prompt injection, jailbreaking, and data poisoning in his hour-long intro to LLMs (which I recommend.)


Yep, regulators are coming for the Llama. Source - Microsoft Image Creator. Sorry for the weird hands, but they are fitting I think :)

Against this checkered history, and given the world-changing potential of generative AI, it is not surprising that companies and governments are starting to draft regulations around the technology. I expect to hear a lot more on actions by bad actors and efforts to curb them in the coming year. Below are some of the key initiatives and privacy+security related startups I’m paying attention to -??


Climate and Sustainability

From a recent episode of This American Life, I learned that the Paris Agreement commits the US to a 50% emissions reduction by 2030, relative to 2005. The good news is that we are already halfway there, while the remaining 25% seems achievable. Rather, it seems achievable if we do not factor in the impact of generative AI on world energy consumption.

Generative AI is extremely power-hungry. An article in the Wall Street Journal claims “global electricity consumption for AI systems could soon require adding the equivalent of a small country’s worth of power generation to our planet.” I’m not a climate scientist, but this does not sound great for the planet.

The need for additional power in the near future is not a hypothetical. Microsoft is already planning to power generative AI using nuclear power in the near future. The company is partnering with Constellation Energy, which projects “new demand for power for data centers could be five or six times what will be needed for charging electric vehicles.”


Home. Source - Wikimedia

If climate change is the planet’s most existential crisis, and generative AI drives us closer to an abyss from which there is no turning back, should we pump the brakes? Or perhaps use nuclear fusion - a technology that is literally being invented as we speak? How about redesigning compute and storage from the ground up for AI-specific tasks like d-Matrix and Cerebras? Or using opitcs like Lightmatter and Ayar Labs? Alternatively, we could focus our collective energies on making quantum computing real and present.

I have a feeling the right answer is a combination of all of the above. Figuring out how to solve for the externalities of generative AI should be on top of the priority list of every deep-tech investor, from Silicon Valley to Shenzhen.

That brings me to the end of my prognostication - hope you enjoyed reading it and found some ideas to mull over. I'd love to hear your thoughts and comments. Please feel free to add them below or reach out directly.

I plan to follow up with a short review of existing business models around generative AI and how I expect them to evolve in the near future. Subscribe to the newsletter here to make sure you don't miss it.


End Notes

Matt McIlwain of Madrona has some thoughts on Gen AI in 2024. Check it out here.

Vivek Ramaswami and Sabrina Wu (also of Madrona ) have predictions for AI in 2024 as well - check them out here. Vivek and Sabrina have a great track record.

My day job is advising growing companies on fundraising and M&A. Let me know if I can be helpful to you.









Vivek Ramaswami

Partner at Madrona

1 年

Thanks for the mention Ubaid!

Very informative Ubaid Dhiyan ??LLM on operational data from complex infrastructure environments and claims to supercharge observability workflows with its proprietary technologyDozens of derivative, task-specific models built on top of open-source models like Llama and Mistral.

回复

要查看或添加评论,请登录

Ubaid Dhiyan的更多文章

  • Anthropic Announces $3.5 B in New Funding

    Anthropic Announces $3.5 B in New Funding

    Source: Anthropic Anthropic, creator of Claude, has announced a $3.5 billion funding round at a post-money valuation of…

  • Endpoints, AppSec, Ransomware, Observability

    Endpoints, AppSec, Ransomware, Observability

    NinjaOne, Semgrep, Mimic, Arize NinjaOne Lands $500 M in Outsized Series C Austin-based endpoint management company…

  • IBM to Acquire DataStax

    IBM to Acquire DataStax

    IBM is acquiring DataStax - creator of DataStax Enterprise, the Enterprise distribution of Apache Cassandra. DataStax…

  • Together AI Ups the Ante With $305M Series B

    Together AI Ups the Ante With $305M Series B

    Source: Together AI Together AI has secured a $305 million Series B investment, led by General Catalyst and co-led by…

  • Baseten Raises $75M to Accelerate Inference Infrastructure

    Baseten Raises $75M to Accelerate Inference Infrastructure

    Source: Baseten Baseten, a San Francisco-based AI inference platform provider, has secured $75 million in Series C…

  • Humane is Dead, Thinking Machines is Alive

    Humane is Dead, Thinking Machines is Alive

    Humane, HP, Thinking Machines HP Acquires Humane AI Assets for $116M Bloomberg reports HP has acquired assets of…

  • Back in Action + Deal Announcement

    Back in Action + Deal Announcement

    Agnostiq, DataRobot, TrueFoundry, Traceable, Harness, CyberArk, Zilla Security, Sardine, Positron AI, DoiT…

    1 条评论
  • CFIUS, Search, Thinking Fast & Slow

    CFIUS, Search, Thinking Fast & Slow

    Cerebras, Liner, Sequoia, Relyance AI, Braintrust Data, Voyager AI, Distributional Cerebras Pushes Roadshow Amid CFIUS…

  • Quirky, Liquidity, Code, Math

    Quirky, Liquidity, Code, Math

    Happy Friday everyone! Today's update is a bit longer than usual - there was a flurry of activity in Generative AI this…

    2 条评论
  • Testing, AppSec, Threat Intel, Work AI

    Testing, AppSec, Threat Intel, Work AI

    Tricentis, Checkmarx, Recorded Future, Strider Technologies, Zafran Security, Glean Large software focused PE firms are…

社区洞察

其他会员也浏览了