登录查看更多内容

Harnessing the Power of Distributed Language Models: An Excursion into LLaMA, BLOOM, and Petals

Chris Clark

发布日期: 2023年7月18日

Imagine a world where the power of gargantuan language models like LLaMA-65B and BLOOM-176B can be harnessed straight from your humble laptop or desktop. Through the mystical world of the distributed computing revolution, we are opening a portal to such a world — an era of endless possibilities.

Debuting into the spotlight, Petals is a pioneering platform that collaboratively operates these behemoth language models. With a small piece of the model dwelling in your device, you join forces with individuals scattered across the globe, each wielding their share of the language model. Together, you breathe life into an intricate web of linguistic understanding.

For the curious minds, Python libraries such as 'transformers' and 'petals' serve as your magic wand. A simple spell involving a model name — "enoch/llama-65b-hf" for instance, or perhaps "bigscience/bloom" or "bigscience/bloomz" — sets the stage for an ensuing marvel. Unleash the raw power of the AutoTokenizer and the AutoDistributedModelForCausalLM, and voila! You can generate text and fine-tune these language models for your personalized tasks.

To ignite this magic, consider this enticing incantation:

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM

model_name = "enoch/llama-65b-hf"  
tokenizer = AutoTokenizer(model_name)
model = AutoDistributedModelForCausalLM.from_pretrained(model_name)
inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, max_new_tokens=5)
print(tokenizer.decode(outputs[0]))  # A cat sat on a mat...

In this intricate dance of code, the text input — "A cat sat" — stirs into life and transforms, much like the story that follows the phrase.

The Petals platform exemplifies an enchanting blend of flexibility and comfort, akin to the agile python it's born from. With single-batch inference that clocks at 3-4 steps/sec for LLaMA-65B and about 1 step/sec for BLOOM-176B, it outperforms offloading by up to 10 times. This impressive speed is sufficient for chatbots and other interactive apps and offers an avenue for personalized fine-tuning, sampling methods, and custom paths.

Stanimir Sotirov 11 个月前

Overview of the Mirascope Framework: Potential and…

Sergei Audzeichyk 1 个月前

Top 5 Open-Source LangChain Alternatives to Use in 2024

Surya Haridass 3 个月前

However, any magic requires a little groundwork. An Anaconda environment must be conjured to prepare your device to host a part of LLaMA-65B or its companions. This enchantment requires Linux and Python 3.7+ and is invoked with a few commands:

conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
pip install git+https://github.com/bigscience-workshop/petals
python -m petals.cli.run_server enoch/llama-65b-hf --adapters timdettmers/guanaco-65b

If you'd prefer an alternative, don't worry: Docker images are available. They operate smoothly on Linux, macOS, and Windows with WSL2.

The narrative doesn't end here; rather, it unfurls into a saga of tutorials, examples, and abundant resources. For novices and wizards alike, guides aid in prompt-tuning LLaMA-65B for text semantic classification or to breathe life into a personified chatbot with BLOOM.

Remember, in Petals, you aren't just an observer but an active participant. Your data joins the swarm of public information, contributing to a cooperative force propelling these models into action. However, if privacy is your fortress, you can conjure a private swarm within trusted confines.

This is a universe where you hold the power of titanic language models, yet you're not alone. You're part of a grand orchestra that collaboratively brings these models to life, igniting a new era in the language processing world — an era of distributed language models.

Harnessing the Power of Distributed Language Models: An Excursion into LLaMA, BLOOM, and Petals

Chris Clark

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Unlocking Insights with Azure AI Language: Part 1

The Disruptive Shift: Software Engineers Vs. Language Models

Langchain

Whisper ASR and Language Models, a dynamic duo!

Latest Advancements in RAG Every Developer Should Know!

Embracing the Future: How LLMs and RAG Systems are Transforming AI in 2024

NER and Standard Libraries

Getting Started with the OpenAI API

Building Prism: How We Developed Our Domain-Specific Language with Patrick Lee

How to Ship LLMs in Production ?

领英推荐

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

2024年9月3日

A Web-Based Solution for Federated Learning with LLM-Based Automation

2024年9月3日

Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs

2024年9月3日

CONFLICTBANK: A Benchmark for Evaluating Knowledge Conflicts in Large Language Models

2024年9月3日

STRATEGIST: LEARNING STRATEGIC SKILLS BY LLMS VIA BI-LEVEL TREE SEARCH

2024年9月2日

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

2024年9月2日

Controllable Text Generation for Large Language Models: A Survey

2024年9月2日

Unboxing Occupational Bias: Grounded Debiasing of LLMs with U.S. Labor Data

2024年9月2日

Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification

2024年9月2日

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

2024年9月2日

社区洞察

其他会员也浏览了

Unlocking Insights with Azure AI Language: Part 1

The Disruptive Shift: Software Engineers Vs. Language Models

Langchain

Whisper ASR and Language Models, a dynamic duo!

Latest Advancements in RAG Every Developer Should Know!

Embracing the Future: How LLMs and RAG Systems are Transforming AI in 2024

NER and Standard Libraries

Getting Started with the OpenAI API

Building Prism: How We Developed Our Domain-Specific Language with Patrick Lee

How to Ship LLMs in Production ?