Beginner’s Guide to Running Mistral 7B Locally on a Single GPU
Mistral 7B is a state-of-the-art large language model developed by Mistral AI. It is designed to perform a wide range of natural language processing (NLP) tasks with high accuracy and efficiency. The model is built with 7 billion parameters, making it one of the largest and most powerful language models available.
In this blog post, we will explore the key features of Mistral 7B and provide a step-by-step guide to running the model using the Hugging Face library.
If you wish to run Llama 3 models locally then check my other guide Beginner's Guide to Running ?? LLama 3 Locally On a Single GPU
Note: Mistral 7 are gated repositories inside Hugging Face which means in order to access them you either need to login with your Huggingface account or generate a token within your Huggingface account and use it to download the repo.
Create a free account on Huggingface, search for mistralai/Mistral-7B-Instruct-v0.1, fill out the form and wait for the approval. Once approved, you'll receive an email and then navigate to Settings inside your Huggingface account go to Access Tokens and create a new token.
Make sure to add the Mistral 7B repository inside the Repositories permission if you have created a token of the type FINEGRAINED.
Key Features of Mistral 7B:
1. Large Scale: Mistral 7B is built with 7 billion parameters, making it one of the largest language models available.
2. High Performance: The model is designed to deliver high performance in various NLP tasks, thanks to its extensive training on diverse datasets.
3. Versatility: Mistral 7B can be fine-tuned for specific tasks, making it a versatile tool for developers and researchers.
4. Accessibility: The model is available through the Hugging Face library, making it easy to integrate into applications.
Step-by-Step Guide to Running Mistral 7B using Hugging Face:
Create a new directory for your project and access it via the terminal.
mkdir mistral_project
cd mistral_project
Inside this new directory, create a new Python virtual environment by running the following command in your terminal.
python3 -m venv venv_name
Activate the environment by running the command:
source venv_name/bin/activate
If you're on Windows, then run the following commands:
python -m venv venv_name
Then activate the virtual environment.
venv_name\Scripts\activate
1. Install the Required Libraries:
Ensure you have Python installed on your computer. Then, install the Hugging Face Transformers library and PyTorch.
pip install transformers torch
We will be using a quantized version of the model for better inference performance, let's also install the bitsandbytes package.
领英推荐
pip install bitsandbytes
2. Load the Mistral 7B Model:
Inside your project directory, create a new Python file mistral-7b.py .
Import all dependencies.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from huggingface_hub import login
import bitsandbytes as bnb
To ensure GPU access, add the following lines of code.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
Now, login using your Hugging Face access token.
login('YOUR_ACCESS_TOKEN')
Use the Hugging Face library to load the Mistral 7B model and tokenizer.
model_name = "mistralai/Mistral-7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_8bit=True,
device_map="auto"
)
The above block of code will download the Mistral 7B models and the tokenizer from Huggingface. This can take a few minutes to run. After the model and tokenizer are downloaded, you can begin using the model. However, It's important that we save them to continue using it without downloading it every time.
3. Save the Mistral 7B Model and Tokenizer for Inference:
# Save the quantized model and tokenizer locally
model.save_pretrained("./quantized_model")
tokenizer.save_pretrained("./quantized_model")
After the model and tokenizer are saved, load the model from the local directory using the command below. At this point, you may replace the initial block of code from step #2 that downloaded the model and tokenizer from Hugging Face.
# Load the saved quantized model and tokenizer
model_dir = "./quantized_model"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(
model_dir,
load_in_8bit=True,
device_map="auto"
)
4. Run Locally Saved Mistral 7B For Text Generation
Finally, add the rest of the code for using the locally saved model for generating text using Mistral 7B models.
def generate_text(prompt):
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs["input_ids"], max_length=200)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
prompt = """
Once upon a time there was a
"""
generated_text = generate_text(prompt)
print("Generated Text:", generated_text)
Full code for mistral-7b.py file.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from huggingface_hub import login
import bitsandbytes as bnb
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
login('YOUR_ACCESS_TOKEN')
#Download Mistral7B from Hugging Face
model_name = "mistralai/Mistral-7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_8bit=True,
device_map="auto"
)
# Save the quantized model and tokenizer locally
model.save_pretrained("./quantized_model")
tokenizer.save_pretrained("./quantized_model")
# Load the saved quantized model and tokenizer
model_dir = "./quantized_model"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(
model_dir,
load_in_8bit=True,
device_map="auto"
)
def generate_text(prompt):
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(inputs["input_ids"], max_length=200)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
prompt = """
Once upon a time there was a
"""
generated_text = generate_text(prompt)
print("Generated Text:", generated_text)
The above code was tested on a machine with 32GB RAM and 16GB GPU. The token generation was not very fast but doable.
The Mistral 7B large language model is a powerful tool for natural language processing tasks. By following the steps outlined in this blog post, you can easily run the model on your computer using the Hugging Face library. Whether you are a developer, researcher, or enthusiast, Mistral 7B offers a versatile and accessible solution for your NLP needs.
IT Advisor, Consultant, Software Designer & Developer
1 个月The variable 'device' is assigned a value but never used.