登录查看更多内容

Here's Why LLM Compression Matters

Chris Zaharias

Lightning Fast and crazy small foundational models.

发布日期: 2024年8月19日

The above image was ChatGPT's output when prompted to remake the below one in 1920x1080 pixels format:

The above hallucination was the result of AI compute and energy limitations that even OpenAI faces despite having raised over $14 billion from Microsoft, Google, Amazon and others.

Large Language Models (LLMs, the back-end to generative AI front-ends) are the most important new technology since the Internet and will dramatically increase the efficiencies of people, companies, industries and economies that make use of them.

Before that can happen, though, LLMs need to get more accurate and more efficient.

Hundreds of LLM-focused startups have formed to make LLM development and deployment easier, cheaper, faster, less energy intensive and more user friendly.

They have received tens of billions in funding from investors. Some now have solutions in production, and you're starting to see them exhibit at conferences like Ai4 which I attended in Las Vegas last week.

Typically 80-90% of those startups' employees are deeply technical, with only handfuls of people in Sales, Marketing and other less-technical functions. That's to be expected at this early adopter phase. Their buyers, for now, are equally technical.

For LLM startups to cross Geoffrey Moore's chasm from early adopters to business pragmatists, however, they will need to make their solutions understandable to executives outside the CTO and CIO organizations.

More importantly, if they are to grab the lion's marketshare, they'll need the type of broad market awareness that comes from effective marketing.

To effectively market, in turn, these startups will need to come up with analogies for the GenAI technical challenges their solutions bring value to.

To give you an idea of what that might look like, let's examine the biggest challenge companies face in deploying LLMs: compute and energy costs can be millions to tens of millions of dollars annually.

To address the skyrocketing compute and energy costs of LLMs, companies are turning to compression technologies such as quantization, pruning, low-rank approximation, knowledge distillation and others. Some of these are open source; others are being built by startups, but it is clear that LLM compression tools will be a massive market opportunity.

What is quantization, you ask? And how does it compare to low-rank approximation? That's easy - just look at this chart!

Lost? Here's what quantization is:

If you're packing a suitcase and want to bring as many clothes as possible without exceeding the airline's weight limit, instead of folding them neatly you roll them up tightly to use less space. The clothes might get a bit wrinkled, but they’re still wearable, and you’ve managed to fit everything you need into the suitcase. In this analogy:

Packing the suitcase is like compressing an LLM.
Rolling the clothes is quantization. Instead of using full, detailed descriptions, quantization uses a more compact way to represent the data, which takes up less space.
The slightly wrinkled clothes represent the minor loss of precision or quality that might happen when you compress the model. The clothes still serve their purpose, just as the compressed model still functions effectively for most tasks.

So, quantization is like rolling up your clothes to fit more into your suitcase. It compresses the LLM by using a more compact representation of the data, allowing the model to take up less space and use fewer resources, while still retaining most of its original functionality.

How about black-box knowledge distillation, another LLM compression technique? This will help:

Black-box KD usually prompts the teacher LLM to generate a distillation dataset for fine-tune the student LM, thereby transferring capabilities from teacher LLM to the student LM. In Black-box KD, teacher LLMs such as ChatGPT (gpt-3.5-turbo) and GPT4 (OpenAI, 2024) are typically employed, while smaller LMs (SLMs), such as GPT2 (Radford et al., 2019), T5 (Raffel et al., 2020), FlanT5 (Chung et al., 2024), and CodeT5 (Wang et al., 2021), are commonly utilized as student LMs. On the other hand, researchers find that LLMs have emergent abilities, which refers to a significant improvement in performance when the model reaches a certain scale, showcasing surprising capabilities. Lots of Black-box KD methods try to distill emergent abilities from LLMs to student LMs, and we introduce three commonly used emergent ability distillation methods: Chain-ofThought (CoT) Distillation, In-Context Learning (ICL) Distillation, and Instruction Following (IF) Distillation. [SOURCE: https://arxiv.org/pdf/2308.07633]

Still confused? Try this analogy:

Imagine you have an experienced, wise teacher who knows a vast amount of information on many subjects. This teacher spends years learning and understanding complex topics in great depth. Now, the teacher has an assistant who needs to learn enough to teach others, but doesn't need to know every single detail the teacher does. The teacher takes the time to explain the most important concepts to the assistant, simplifying the knowledge and focusing on what’s essential to do the job effectively. In this analogy:

The experienced teacher represents the original, large, and fully-trained LLM.
The assistant is like the smaller, compressed model created through knowledge distillation.
The process of teaching the assistant is the knowledge distillation process itself. The teacher (large model) passes on the key knowledge, simplifying and distilling it so that the assistant (smaller model) can perform similar tasks well, but with less complexity and fewer resources.

So, knowledge distillation in LLMs is like a wise teacher passing on the most important and useful knowledge to an assistant. The result is a smaller, more efficient model that retains much of the original model's capabilities, but in a more compact form.

CONCLUSION: These visual and written metaphors let non-technical finance, operations and executive leaders know what value an LLM startup's technology will bring to their GenAI initiatives.

LLM startups that hope to start an industry movement based on their ground-breaking technology will need to analogize while proving they are necessary and unique.

Do that and you can market, generate awareness and inbound, and quickly get deals through the pipeline.

Analogies are the way to cross the LLM chasm.

要查看或添加评论，请登录

Chris Zaharias的更多文章

LinkedIn Is Now 62% Anonymous. How To Win Anyways.

2024年8月28日

LinkedIn Is Now 62% Anonymous. How To Win Anyways.

Back in 2016 I tracked the LinkedIn profile viewing activity of hundreds of people. In addition to seeing whose…

1 条评论
Finding The Right GenAI Pitch To Hit

2024年5月28日

Finding The Right GenAI Pitch To Hit

Ever since working at Netscape 1995-99 and feeling what it's like selling new, disruptive technology that only you have…
LinkedIn's Missing Functions

2022年10月31日

LinkedIn's Missing Functions

When using LinkedIn Sales Navigator to find your B2B prospects, there are two ways to target specific functional roles…

4 条评论
LinkedIn Data Export: [Almost] None Shall Pass

2022年10月11日

LinkedIn Data Export: [Almost] None Shall Pass

Back in late 2018, with no announcement and under cover of night, LinkedIn changed their data export feature so that…

1 条评论
Title = 1/2 * Function in Sales Navigator

2022年9月13日

Title = 1/2 * Function in Sales Navigator

If you use Sales Navigator regularly, you probably already know that searching by Function generally returns more…

1 条评论
33% Of You Are In The Wrong Industry

2022年9月7日

33% Of You Are In The Wrong Industry

When I joined LinkedIn, I was working as an inside sales associate at Parc Place Systems, a developer of…
I've Organized LinkedIn's 417 Industries So You Don't Have To

2022年9月1日

I've Organized LinkedIn's 417 Industries So You Don't Have To

Back in March/April 2022 LinkedIn made a number of big changes to Sales Navigator, one of which was to massively…

2 条评论
LinkedIn Title length Goldilocks Zone

2019年11月29日

LinkedIn Title length Goldilocks Zone

Having recently joined data annotation firm iMerit, I updated my LinkedIn profile knowing I'll be visiting EMEA…
Care to know how quickly most profile view-backs happen?

2017年8月16日

Care to know how quickly most profile view-backs happen?

Profile viewing has always been LinkedIn's 2nd most-used feature. But how immediate is our need to know who's looking…

14 条评论
You'll laugh at which LinkedIn group has shrunk the most lately...

2017年8月15日

You'll laugh at which LinkedIn group has shrunk the most lately...

On May 11th I recorded the number of members in 40 LinkedIn groups, and did so again today (August 15) using the…

6 条评论

See all articles

Chris Zaharias的更多文章

LinkedIn Is Now 62% Anonymous. How To Win Anyways.

Finding The Right GenAI Pitch To Hit

LinkedIn's Missing Functions

LinkedIn Data Export: [Almost] None Shall Pass

Title = 1/2 * Function in Sales Navigator

33% Of You Are In The Wrong Industry

I've Organized LinkedIn's 417 Industries So You Don't Have To

LinkedIn Title length Goldilocks Zone

Care to know how quickly most profile view-backs happen?

You'll laugh at which LinkedIn group has shrunk the most lately...