GenAI Weekly — Edition 10

GenAI Weekly — Edition 10

Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs

Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.

? Click subscribe to be notified of future editions


Microsoft: Introducing Phi-3: Redefining what’s possible with Small Language Models

From the Azure blog:

Starting today, Phi-3-mini, a 3.8B language model is available on Microsoft Azure AI Studio, Hugging Face, and Ollama.?

  • Phi-3-mini is available in two context-length variants—4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality.
  • It is instruction-tuned, meaning that it’s trained to follow different types of instructions reflecting how people normally communicate. This ensures the model is ready to use out-of-the-box.
  • It is available on Azure AI to take advantage of the deploy-eval-finetune toolchain, and is available on Ollama for developers to run locally on their laptops.

Phi-3 models significantly outperform language models of the same and larger sizes on key benchmarks (see benchmark numbers below, higher is better). Phi-3-mini does better than models twice its size, and Phi-3-small and Phi-3-medium outperform much larger models, including GPT-3.5T.??
All reported numbers are produced with the same pipeline to ensure that the numbers are comparable. As a result, these numbers may differ from other published numbers due to slight differences in the evaluation methodology. More details on benchmarks are provided in our technical paper.

Microsoft has been doggedly pursuing these so-called “small language models” (SLMs) while the whole world has been busy chasing LLMs. What is very impressive is how models much smaller than GPT-3.5-Turbo are beating it on some benchmarks. A big advantage of these models is privacy: they can be run on mobile devices directly so your data doesn’t have to the sent to the cloud. Not to mention, there are cost and latency savings as well. For those of you interested, there’s the technical paper as well.


Snowflake releases a flagship generative AI model of its own

Kyle Wiggers writing for TechCrunch:

TechCrunch

Snowflake’s flagship model in a family of generative AI models called Arctic, Arctic LLM — which took around three months, 1,000 GPUs and $2 million to train — arrives on the heels of Databricks’ DBRX, a generative AI model also marketed as optimized for the enterprise space.
Snowflake draws a direct comparison between Arctic LLM and DBRX in its press materials, saying Arctic LLM outperforms DBRX on the two tasks of coding (Snowflake didn’t specify which programming languages) and SQL generation. The company said Arctic LLM is also better at those tasks than?Meta’s Llama 2 70B (but not the more recent Llama 3 70B) and Mistral’s Mixtral-8x7B.
“Our dream here is, within a year, to have an API that our customers can use so that business users can directly talk to data,” Ramaswamy said. “It would’ve been easy for us to say, ‘Oh, we’ll just wait for some open source model and we’ll use it. Instead, we’re making a foundational investment because we think [it’s] going to unlock more value for our customers.”

Snowflake had to do it since its rival Databricks did it? God alone knows. Nevertheless, it’s good to see models that are specialized for an audience.


Apple releases eight small AI language models aimed at on-device use

Benj Edwards writing for ArsTechnica:

Benj Edwards Ars Technica

On Tuesday, we covered Microsoft's Phi-3 models, which aim to achieve something similar: a useful level of language understanding and processing performance in small AI models that can run locally. Phi-3-mini features 3.8 billion parameters, but some of Apple's OpenELM models are much smaller, ranging from 270 million to 3 billion parameters in eight distinct models.
In comparison, the largest model yet released in Meta's Llama 3 family includes 70 billion parameters (with a 400 billion version on the way), and OpenAI's GPT-3 from 2020 shipped with 175 billion parameters. Parameter count serves as a rough measure of AI model capability and complexity, but recent research has focused on making smaller AI language models as capable as larger ones were a few years ago.
While Apple has not yet integrated this new wave of AI language model capabilities into its consumer devices, the upcoming iOS 18 update (expected to be revealed in June at WWDC) is rumored to include new AI features that utilize on-device processing to ensure user privacy—though the company may potentially hire Google or OpenAI to handle more complex, off-device AI processing to give Siri a long-overdue boost.

Sam Altman Says the Age of Giant AI Models Is Already Over

Will Knight writing for Wired:

WIRED

OpenAI has delivered a series of impressive advances in AI that works with language in recent years by taking existing machine-learning algorithms and scaling them up to previously unimagined size. GPT-4, the latest of those projects, was likely trained using trillions of words of text and many thousands of powerful computer chips. The process cost over $100 million.
But the company’s CEO, Sam Altman, says further progress will not come from making models bigger. “I think we're at the end of the era where it's going to be these, like, giant, giant models,” he told an audience at an event held at MIT late last week. “We'll make them better in other ways.”
Altman’s declaration suggests an unexpected twist in the race to develop and deploy new AI algorithms. Since OpenAI launched ChatGPT in November, Microsoft has used the underlying technology to add a chatbot to its Bing search engine, and Google has launched a rival chatbot called Bard. Many people have rushed to experiment with using the new breed of chatbot to help with work or personal tasks.
Meanwhile, numerous well-funded startups, including?Anthropic,?AI21,?Cohere, and?Character.AI, are throwing enormous resources into building ever larger algorithms in an effort to catch up with OpenAI’s technology. The initial version of ChatGPT was based on a slightly upgraded version of GPT-3, but users can now also access a version powered by the more capable GPT-4.
Altman’s statement suggests that GPT-4 could be the last major advance to emerge from OpenAI’s strategy of making the models bigger and feeding them more data. He did not say what kind of research strategies or techniques might take its place. In the paper describing GPT-4, OpenAI says its estimates suggest diminishing returns on scaling up model size. Altman said there are also physical limits to how many data centers the company can build and how quickly it can build them.

OpenAI is still the north star that everyone else is aligning with. GPT-4 is yet to be comprehensively beaten. Better models from them are expected this year.

Did you see my popcorn bucket?


What can LLMs never do?

Rohit Krishnan writing for Strange Loop Canon:

Rohit Krishnan

over the past few weeks I have been obsessed by trying to figure out the failure modes of LLMs. This started off as an exploration of what I found. It is admittedly a little wonky but I think it is interesting. The failures of AI can teach us a lot more about what it can do than the successes.

The starting point was bigger, the necessity for task by task evaluations for a lot of the jobs that LLMs will eventually end up doing. But then I started asking myself how can we figure out the limits of its ability to reason so that we can trust its ability to learn.

LLMs are hard to, as I've written multiple times, and their ability to reason is difficult to separate from what they're trained on. So I wanted to find a way to test its ability to iteratively reason and answer questions.
I started with the simplest version of it I could think of that satisfies the criteria: namely whether it can create wordgrids, successively in 3x3, 4x4 and 5x5 sizes. Why this? Because evaluations should be a) easy to create, AND b) easy to evaluate, while still being hard to do!
Turned out that all modern large language models fail at this. Including the heavyweights, Opus and GPT-4. These are extraordinary models, capable of answering esoteric questions about economics and quantum mechanics, of helping you code, paint, make music or videos, create entire applications, even play chess at a high level. But they can’t play sudoku.
Or, take this, LLMs have a Reversal Curse.
If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name.

Sometimes, it is important to understand what something is not in order to know more about it. [I was tempted to write “Sometimes, it is important to understand who someone is not in order to know more about them.” But, well.]


Large language models (e.g., ChatGPT) as research assistants

From Daniel Lemire’s blog:

The primary skills of academics are language-related: synthesis, analogy, extrapolation, etc. Academics analyze the literature, identify gaps, and formulate research questions. They review and synthesize existing research. They write research papers, grant proposals, and reports. Being able to produce well-structured and grammatically correct prose is a vital skill for academics.
Unsurprisingly, software and artificial intelligence can help academics, and maybe replace them in some cases. Liang et al. found that an increasing number of research papers are written with tools like GPT-4 (up to 18% in some fields). It is quite certain that in the near future, a majority of all research papers will be written with the help of artificial intelligence. I suspect that they will be reviewed with artificial intelligence as well. We might soon face a closed loop where software writes papers while other software reviews it.

When writing improved with pens, typewriters or computers, it was humans who still had to do all the thinking. And written language used to be a skill that was unique to humans. I never thought I’d wistfully think about the past in these terms in just 2024.


For the extra curious

Bikram Barman

Senior Technology Leader | Global Capability Center Head | Entrepreneur

10 个月

The concept of SLMs is interesting especially from a small footprint and privacy perspective. It could do certain things and then hand over to LLMs when more complexity is involved. Sort of a pre processor / edge processor.

要查看或添加评论,请登录

Shuveb Hussain的更多文章

  • GenAI Weekly — Edition 37

    GenAI Weekly — Edition 37

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

  • GenAI Weekly — Edition 36

    GenAI Weekly — Edition 36

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

  • GenAI Weekly — Edition 35

    GenAI Weekly — Edition 35

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

    5 条评论
  • GenAI Weekly — Edition 34

    GenAI Weekly — Edition 34

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

  • GenAI Weekly — Edition 33

    GenAI Weekly — Edition 33

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

    1 条评论
  • GenAI Weekly — Edition 32

    GenAI Weekly — Edition 32

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

    2 条评论
  • GenAI Weekly — Edition 31

    GenAI Weekly — Edition 31

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

    1 条评论
  • GenAI Weekly — Edition 30

    GenAI Weekly — Edition 30

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

    2 条评论
  • GenAI Weekly — Edition 29

    GenAI Weekly — Edition 29

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

  • GenAI Weekly — Edition 28

    GenAI Weekly — Edition 28

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

社区洞察

其他会员也浏览了