Deploying an LLM Using Amazon SageMaker JumpStart: A Step-by-Step Guide
Rany ElHousieny, PhD???
Generative AI ENGINEERING MANAGER | ex-Microsoft | AI Solutions Architect | Generative AI & NLP Expert | Proven Leader in AI-Driven Innovation | Former Microsoft Research & Azure AI | Software Engineering Manager
Deploying a large language model (LLM) on AWS SageMaker can be an intimidating task for beginners, but with Amazon SageMaker JumpStart, the process becomes much easier. In this guide, we'll walk through the steps to set up SageMaker, select a foundation model, and prepare it for deployment.
Step 1: Accessing SageMaker JumpStart
SageMaker JumpStart provides a curated collection of pre-built machine learning models and solutions, including cutting-edge foundation models such as Llama 3.2. Follow these steps to get started.
1.1: Open SageMaker in AWS Console
1.2: Navigate to JumpStart
1.3: Understand Foundation Models
Before proceeding, it’s essential to understand what foundation models are:
1.4: Find Meta’s Llama 3.2 Models
On the Foundation Models page:
1.5: Choose a Model
By completing these steps, you’ve set up the foundation to deploy a pre-trained LLM using SageMaker JumpStart. In the next step, we’ll walk through configuring and deploying the model to create a real-time endpoint for experimentation.
The differences between Llama 3.2 1B and Llama 3.2 1B Instruct
1. Llama 3.2 1B (Base Model)
2. Llama 3.2 1B Instruct
Answering questions.
Following specific instructions more accurately.
Use cases requiring structured responses, like chatbot development or task completion.
Better at understanding and executing specific instructions or conversational queries.
More suitable for "instruction-response" scenarios, such as customer support, code generation, or educational tools.
Which Should You Choose?
Choose Llama 3.2 1B if:
Choose Llama 3.2 1B Instruct if:
Since we are going to use this model for instructions, I will select the instruction model.
Step 2: Deploying the Selected Model on SageMaker
Now that you’ve selected your desired foundation model (e.g., Meta Llama 3.2 1B Instruct), it’s time to configure and deploy the model to create an endpoint. This endpoint allows you to interact with the model programmatically through API calls.
Step 2.1: Understand the Deployment Process
Before deploying, here’s an overview of what happens during this step:
Step 2.2: Configure the Deployment
Instance Type:
ml.t2.medium: Cheapest option (CPU-only, limited performance).
ml.m5.large: More memory but still CPU-only.
ml.g4dn.xlarge: Affordable GPU option for LLMs.
ml.g5.xlarge: Powerful GPU instance for faster inference.
2 - Number of Instances:
3 - Default Configuration:
You should be on the Model Details page from the previous step:
Step: Open the Deployment Notebook
1 - Click "Open notebook in Studio":
Select a Domain
You're currently in the process of creating a SageMaker Studio user profile, which is necessary to use SageMaker Studio. Here’s what you need to do step by step on the screen above:
Fill Out the Profile Information
1 - Name:
2 - Execution Role:
Check the Role’s Policies
1 - Go to the AWS IAM Console:
2 - Find the Role:
3 - View Attached Policies:
Step 2: Add Necessary Permissions
A. AmazonSageMakerFullAccess
B. S3 Permissions
领英推荐
Now that your SageMaker execution role has the necessary permissions, let’s proceed to the next step: setting up SageMaker Studio.
Step 2: Complete the SageMaker Studio User Profile Setup
If you haven’t already clicked Next, proceed to the next screen in the user profile creation wizard.
2 - Step 2: Configure Applications:
3 - Step 3: Customize Studio UI:
4: Data and Storage Settings:
5: Review and Create:
Step 5: Launch SageMaker Studio
1 : Open SageMaker Studio:
2: Wait for Studio to Launch:
Step 6: Deploy the Llama 3.2 1B Instruct Model in SageMaker Studio
Step 6.1: Open SageMaker JumpStart
1 - In the SageMaker Studio interface:
2 - Search for the Meta Llama 3.2 1B Instruct model:
Note: You may request access if you do not have access:
Step 6.2: Configure the Model Deployment
1 - On the model details page:
2 - Select the deployment method:
If there’s a Deploy button:
SageMaker endpoint for Llama 3.2 1B Instruct is now being created. This is the final part of the deployment process, and we’ll move on once the status changes from Creating to InService.
We successfully tested the Llama 3.2 1B Instruct model endpoint, and it returned a response to your test input: "Tell me a joke?". The model generated a humorous reply, which confirms that the endpoint is working as expected. ??
You can shut down the SageMaker endpoint to save costs and restart it later when needed. SageMaker endpoints incur costs as long as they are running, so stopping them is a great way to save money when they're not in use.
Here’s how to manage your endpoint:
Stopping the SageMaker Endpoint
1 - Go to the SageMaker Console:
2 - Find Your Endpoint:
3 - Delete the Endpoint:
Note: Deleting the endpoint does not delete the model or configuration, so you can redeploy it later without having to set up everything from scratch.
Restarting the Endpoint Later
Recreate the Endpoint:
This process is quicker than the initial deployment since the model artifacts are already stored in S3.
Alternative: Use SageMaker Model Registry (Optional)
Check the cost:
--
3 个月Very informative ??