登录查看更多内容

Large Language Models - the new "new best thing" after sliced bread.

Gaurav Dhir

Sr Software Manager @ Amazon Kuiper

发布日期: 2023年4月18日

If you've been curious about LLMs (large language models), you're not alone. Everywhere you turn, people are raving about the incredible capabilities of these machine learning models that use deep learning algorithms to process and understand natural language. They are the new "new best thing" after sliced bread, and it's no wonder why. Midjourney can create artwork, ChatGPT can write poetry - the possibilities seem endless.

But how do you get started with LLMs? It can be overwhelming to navigate this new space, especially if you're not familiar with the science behind it. That's where this post comes in. I’ll focus on how to use existing LLMs in a simple standalone (but not optimized by any means) Python program to get some results. Once you've mastered this, you'll be able to take the leap to integrate LLMs into your own applications with ease.

But first, let's answer a basic question:

What are LLMs, exactly? In simple terms, they are machine learning models that use deep learning algorithms to process and understand natural language. They are typically trained on vast amounts of textual data, which means they take a long time to train (at least, the bigger and more generic ones). For example, GPT-4 took between 4-7 months to train.

There are many (more than 100K) LLMs available for use, few of the most popular ones are Bloom, GPT-4, BERT, XLNet, T5, and RoBERTa. Integrating with LLMs can seem daunting, but Hugging Face provides a simple hub of open-source models for natural language processing, computer vision, and other AI areas. With their APIs and a little bit of setup, you'll be able to start using LLMs in your own applications in no time.

To leverage the Hugging Face APIs, start by creating an account and verifying your email address. Once you have an active account, click on your profile icon located on the top right corner of the page. Next, navigate to settings and select Access Tokens from the left menu bar. From here, create a new token and set the role as "write." This token will be used later to authorize API calls in the exercise.

What’s next? Next comes the exciting part – let’s write some code!

Exercise 1. Translation from English to German. Head over to hugging face to see what pre-trained Translation specific natural language processing models are available to us. The first one in the list is the t5-base model which supports translations between English, French, German and Romanian languages.

In your favorite python IDE (I use Wing), type in and execute the following code:

import request
from pprint import pprint

API_URL = 'https://api-inference.huggingface.co/models/t5-base'
headers = {'Authorization': 'Bearer {API_TOKEN}'}
# The {API_TOKEN} is just a placeholder, change it to your owm 
huggingface API access key.

def query(payload):
? ? response = requests.post(API_URL, headers=headers, json=payload)
? ? return response.json()
??
params = {'min_length':10, 'max_length': 400, 'top_k': 2, 'temperature': 0.9}
output = query({
? ? 'inputs': 'Translate to German: This post is about how to use large language models',
? ? 'parameters': params,
})

pprint(output)

--output--
[{'translation_text': 'In diesem Beitrag geht es um die Verwendung gro?er '
? ? ? ? ? ? ? ? ? ? ? 'Sprachmodelle.'}]

Exercise 2: Question & Answer. There are many conversational pretrained models you can choose from. I chose Microsoft's DialoGPT-medium.

领英推荐

Fine-Tuning a Language Model

Solutyics 8 个月前

New Open Long-Context LLM; LLMs For Text Analysis;…

Danny Butvinik 1 年前

AMR Future Brief| Why Have Large Language Models…

Allied Market Research 8 个月前

import request
from pprint import pprint

API_URL = 'https://api-inference.huggingface.co/models/microsoft/DialoGPT-medium'
headers = {'Authorization': 'Bearer {API_TOKEN}'}
# The {API_TOKEN} is just a placeholder, change it to your owm huggingface API access key.

def query(payload):
? ? response = requests.post(API_URL, headers=headers, json=payload)
? ? return response.json()
??
params = {'min_length':10, 'max_length': 400, 'top_k': 2, 'temperature': 0.9, 'max_tokens' : 150, 'pad_token_id' : 50256}
output = query({
? ? 'inputs': 'Can you travel to the sun?',
? ? 'parameters': params,
})

pprint(output)

--output--
{'conversation': {'generated_responses': ["I can't, but I can travel to the "
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 'moon.']

Exercise 3: Summarize a movie plot. For this exercise, I copied the plot for the movie Forest Gump from Wikipedia to a locally stored doc file. I named this file forest_gump.docx and stored it in the same folder as the python program. To summarize the plot, I used Facebook’s Bart Large CNN LLM.

import request
import docx2txt
from pprint import pprint

API_URL = 'https://api-inference.huggingface.co/models/facebook/bart-large-cnn'
headers = {'Authorization': 'Bearer {API_TOKEN}'}
# The {API_TOKEN} is just a placeholder, change it to your owm huggingface API
access key.

def query(payload):
? ? response = requests.post(API_URL, headers=headers, json=payload)
? ? return response.json()
??
params = {'do_sample' : False, 'max_length' : 800, 'min_length' : 120}

def text_from_file(file):
? ? text = docx2txt.process(file)
? ??
? ? if text:
? ? ? ? return text.replace('\t', ' ')
? ??
? ? return None

to_summarize = text_from_file('forest_gump.docx')

output = query({
? ? 'inputs': to_summarize,
? ? 'parameters': params,
})

pprint(output)

--output--
[{'summary_text': 'As a boy in 1956, Forrest Gump has an IQ of 75 and is '
? ? ? ? ? ? ? ? ? 'fitted with leg braces to correct a curved spine. He lives '
? ? ? ? ? ? ? ? ? 'in Greenbow, Alabama, with his mother, who runs a boarding '
? ? ? ? ? ? ? ? ? 'house and encourages him to live beyond his disabilities. '
? ? ? ? ? ? ? ? ? 'In 1981, a man named\xa0Forrest Gump\xa0recounts his life '
? ? ? ? ? ? ? ? ? 'story to strangers who happen to sit next to him at a bus '
? ? ? ? ? ? ? ? ? 'stop. Forrest is awarded theMedal of Honor for his heroism '
? ? ? ? ? ? ? ? ? 'by President\xa0Lyndon B. Johnson. In 1974, Forrest is '
? ? ? ? ? ? ? ? ? 'honorably discharged from the Army and returns to Greenbow '
? ? ? ? ? ? ? ? ? 'where he makes a company that makes ping-pong.'}]

Notes:

If you don't have python installed - see this post.
If after installing/configuring python, your editor complains about missing modules - open command prompt and run

>>python -m pip install <missing_module_name>

Next steps?

In this post, we've learned how to use Hugging Face APIs to interact with pretrained models (t5-base, DialoGPT-medium, and bart-large-cnn) through a simple Python program. We've explored three different use cases, including translation, Q&A, and summarization.

To continue building on this knowledge, here are a few suggested next steps:

Take a look at the Hugging Face models hub to see what's available and what you can do with it.
Experiment with additional exercises, such as building a conversational chatbot or creating an application that summarizes any Wikipedia page.
Learn more about fine-tuning a pre-trained model using transformers.
If you have read this far, thank you! The second post in this series is here where I talk about simple techniques to improve the generated text output from LLMs.

By exploring these possibilities, you can further enhance your skills in using Hugging Face's APIs and building more sophisticated applications with natural language processing capabilities.

Paras Doshi

Head of Data at Opendoor | Ex-Amazon | Professional mission: I help organizations drive growth & profitability through data engineering, analytics, and data science

1 年

2 次回应

要查看或添加评论，请登录

Gaurav Dhir的更多文章

Large Language Models and a Game of Watermelons.

2023年4月26日

Large Language Models and a Game of Watermelons.

Welcome to the second post in the Large Language Models series. In the first post, we learned how to use pretrained…

3 条评论

Large Language Models - the new "new best thing" after sliced bread.

Gaurav Dhir

Sr Software Manager @ Amazon Kuiper

领英推荐

Gaurav Dhir的更多文章

社区洞察

其他会员也浏览了

Mastering Prompt Engineering Techniques – Part 2

How Large Language Models (LLMs) Work and How They Are Developed

Exploring the World of Language Models: GPT-4, Claude 3 Opus, and Meta Llama

A Guide to Training Your Own Language Model