登录查看更多内容

Deploying an LLM Using Amazon SageMaker JumpStart: A Step-by-Step Guide

Rany ElHousieny, PhD???

Generative AI ENGINEERING MANAGER | ex-Microsoft | AI Solutions Architect | Generative AI & NLP Expert | Proven Leader in AI-Driven Innovation | Former Microsoft Research & Azure AI | Software Engineering Manager

发布日期: 2024年12月7日

Deploying a large language model (LLM) on AWS SageMaker can be an intimidating task for beginners, but with Amazon SageMaker JumpStart, the process becomes much easier. In this guide, we'll walk through the steps to set up SageMaker, select a foundation model, and prepare it for deployment.

Step 1: Accessing SageMaker JumpStart

SageMaker JumpStart provides a curated collection of pre-built machine learning models and solutions, including cutting-edge foundation models such as Llama 3.2. Follow these steps to get started.

1.1: Open SageMaker in AWS Console

Log in to your AWS Management Console.
In the Services menu, search for SageMaker and click on it. You’ll be taken to the SageMaker landing page, where all its services and configurations are accessible.

1.2: Navigate to JumpStart

In the left-hand menu of the SageMaker console, locate the JumpStart section.Under JumpStart, select Foundation Models to explore pre-trained models.This page displays a list of pre-built foundation models, including text generation models, computer vision models, and more.

1.3: Understand Foundation Models

Before proceeding, it’s essential to understand what foundation models are:

Foundation models are pre-trained on massive datasets, enabling them to perform a wide range of tasks such as text generation, summarization, and question answering.
SageMaker JumpStart provides ready-to-deploy versions of these models, saving you the hassle of training them from scratch.

1.4: Find Meta’s Llama 3.2 Models

On the Foundation Models page:

Scroll down or use the search bar to locate models provided by Meta.
You’ll see two options for the Meta Llama 3.2 1B model:Meta Llama 3.2 1B: The base version of the model, designed for generic text generation tasks.Meta Llama 3.2 1B Instruct: An instruction-tuned version of the model, optimized for following human-like instructions.Note: The instruction-tuned version is more suited for tasks like question-answering or chatbot applications, while the base version is ideal for open-ended text generation.

1.5: Choose a Model

Click on the View Model button under the version you wish to explore.For this guide, we recommend selecting Meta Llama 3.2 1B Instruct to simplify experimentation.
This will take you to a detailed page about the model, including information about its architecture, supported tasks, and deployment configurations.

By completing these steps, you’ve set up the foundation to deploy a pre-trained LLM using SageMaker JumpStart. In the next step, we’ll walk through configuring and deploying the model to create a real-time endpoint for experimentation.

The differences between Llama 3.2 1B and Llama 3.2 1B Instruct

1. Llama 3.2 1B (Base Model)

Type: General-purpose large language model.
Training: Trained on large-scale datasets to generate coherent and meaningful text based on prompts.
Use Case: Suitable for:Generic text generation tasks (e.g., creative writing, summarization).Applications where the input isn't task-specific or instruction-based.
Limitation:May not follow detailed instructions well.Needs more fine-tuned prompts to achieve desired results.

2. Llama 3.2 1B Instruct

Type: Instruction-tuned large language model.
Training:Built on top of the base Llama 3.2 1B model.Fine-tuned with datasets containing human-like instructions and corresponding outputs.
Use Case: Designed for:

Answering questions.

Following specific instructions more accurately.

Use cases requiring structured responses, like chatbot development or task completion.

Advantage:

Better at understanding and executing specific instructions or conversational queries.

More suitable for "instruction-response" scenarios, such as customer support, code generation, or educational tools.

Which Should You Choose?

Choose Llama 3.2 1B if:

You’re exploring generic capabilities of LLMs.
You want flexibility to experiment with creative or open-ended text generation.

Choose Llama 3.2 1B Instruct if:

You need the model to follow specific instructions or act as a chatbot.
You’re developing an interactive application that requires concise, structured answers.

Since we are going to use this model for instructions, I will select the instruction model.

Step 2: Deploying the Selected Model on SageMaker

Now that you’ve selected your desired foundation model (e.g., Meta Llama 3.2 1B Instruct), it’s time to configure and deploy the model to create an endpoint. This endpoint allows you to interact with the model programmatically through API calls.

Step 2.1: Understand the Deployment Process

Before deploying, here’s an overview of what happens during this step:

SageMaker provisions compute resources (e.g., CPU or GPU instances) to host the model.
The selected model is loaded onto these resources.
An endpoint is created, which serves as an API for interacting with the model in real time.

Step 2.2: Configure the Deployment

Instance Type:

Choose a compute instance based on your workload and budget.
For a lightweight experiment, you can use:

ml.t2.medium: Cheapest option (CPU-only, limited performance).

ml.m5.large: More memory but still CPU-only.

For better performance (GPU-accelerated):

ml.g4dn.xlarge: Affordable GPU option for LLMs.

ml.g5.xlarge: Powerful GPU instance for faster inference.

2 - Number of Instances:

For experiments, set the instance count to 1 to keep costs low.

3 - Default Configuration:

Use the default configurations provided by SageMaker JumpStart if you're unsure about customization. These include memory allocation, environment variables, and other model-specific settings.

You should be on the Model Details page from the previous step:

Step: Open the Deployment Notebook

1 - Click "Open notebook in Studio":

Select a Domain

You're currently in the process of creating a SageMaker Studio user profile, which is necessary to use SageMaker Studio. Here’s what you need to do step by step on the screen above:

Fill Out the Profile Information

1 - Name:

Enter a descriptive name for the profile.
For example: Rany-SageMaker-1 (you can leave it as is).

2 - Execution Role:

Select the AmazonSageMaker-ExecutionRole from the dropdown.

Ensure this role has sufficient permissions: It should have AmazonSageMakerFullAccess. It also needs access to S3 if you'll upload/download files.
Tags (Optional):
Click "Next":

Check the Role’s Policies

1 - Go to the AWS IAM Console:

Navigate to the IAM Management Console.

2 - Find the Role:

In the left-hand menu, click on Roles.

Search for the role you selected earlier (e.g., AmazonSageMaker-ExecutionRole-20241204T075783).

3 - View Attached Policies:

Click on the role to open its details.
Scroll down to the Permissions tab to see the policies attached to the role.

Step 2: Add Necessary Permissions

A. AmazonSageMakerFullAccess

If AmazonSageMakerFullAccess is not listed:Click Attach Policies.Search for AmazonSageMakerFullAccess.Select the checkbox next to it and click Attach Policy.

B. S3 Permissions

Ensure there’s a policy granting access to S3. For example:

AmazonS3FullAccess (provides unrestricted access to all S3 buckets).

领英推荐

Why AWS is the Best Cloud Platform for Machine Learning

OneData Software Solutions 1 个月前

AWS re:Invent 2024 Highlights

NucleusTeq 3 个月前

How to Deploy Mixtral AI 8x7B at AWS

Meetrix.IO 9 个月前

Now that your SageMaker execution role has the necessary permissions, let’s proceed to the next step: setting up SageMaker Studio.

Step 2: Complete the SageMaker Studio User Profile Setup

In the SageMaker Console:

If you haven’t already clicked Next, proceed to the next screen in the user profile creation wizard.

2 - Step 2: Configure Applications:

Leave the default options as is unless you have specific application configurations in mind.
Click Next.

3 - Step 3: Customize Studio UI:

Optional: Customize the SageMaker Studio interface if desired (e.g., enable/disable features).
If unsure, leave the default settings and click Next.

4: Data and Storage Settings:

Review the storage settings.
AutoMountHomeEFS: Leave the Inherit settings from domain checkbox selected. This ensures that SageMaker Studio will automatically handle EFS storage for your user profile.
CustomPosixUserConfig: Leave this unchecked, unless you have a specific need to set custom POSIX configurations (most use cases don’t require this).
Click "Next":

5: Review and Create:

Review the configuration and ensure everything looks correct.
Click Submit to finalize the setup.

Step 5: Launch SageMaker Studio

1 : Open SageMaker Studio:

Go to the SageMaker Console.
Navigate to SageMaker Studio in the left-hand menu under the Control Panel section.
Find your user profile (e.g., Rany-SageMaker-1) and click Open Studio > Studio.

2: Wait for Studio to Launch:

SageMaker Studio will open in a new tab. This might take a minute or two if it's the first time launching.
You should now see the SageMaker Studio interface, which includes options for notebooks, data processing, and model management.

Step 6: Deploy the Llama 3.2 1B Instruct Model in SageMaker Studio

Step 6.1: Open SageMaker JumpStart

1 - In the SageMaker Studio interface:

Look for the JumpStart option in the left-hand navigation pane (At the end after Models).

Click JumpStart to open the available foundation models and pre-built solutions.

2 - Search for the Meta Llama 3.2 1B Instruct model:

Use the search bar at the top of JumpStart and type "Llama 3.2".
Locate Llama 3.2 1B Instruct in the search results and click on it.

Note: You may request access if you do not have access:

Step 6.2: Configure the Model Deployment

1 - On the model details page:

Confirm that the model is self-deployable.
Look for the Deploy button (if available) or the Open Notebook in Studio option.

2 - Select the deployment method:

If there’s a Deploy button:

Click Deploy, select an appropriate instance type (e.g., ml.g5.xlarge or a smaller instance if on a budget), and proceed.

SageMaker endpoint for Llama 3.2 1B Instruct is now being created. This is the final part of the deployment process, and we’ll move on once the status changes from Creating to InService.

We successfully tested the Llama 3.2 1B Instruct model endpoint, and it returned a response to your test input: "Tell me a joke?". The model generated a humorous reply, which confirms that the endpoint is working as expected. ??

You can shut down the SageMaker endpoint to save costs and restart it later when needed. SageMaker endpoints incur costs as long as they are running, so stopping them is a great way to save money when they're not in use.

Here’s how to manage your endpoint:

Stopping the SageMaker Endpoint

1 - Go to the SageMaker Console:

Navigate to the Amazon SageMaker Console.
On the left-hand menu, go to Inference > Endpoints.

2 - Find Your Endpoint:

Locate the endpoint you created (e.g., jumpstart-dft-llama-3-2-1b-instruct...).
Check the Status column to confirm it is currently InService.

3 - Delete the Endpoint:

Select the endpoint and click Delete.
Confirm the deletion. This will stop the instance(s) running behind the endpoint.

Note: Deleting the endpoint does not delete the model or configuration, so you can redeploy it later without having to set up everything from scratch.

Restarting the Endpoint Later

Recreate the Endpoint:

Go to SageMaker Studio or the SageMaker Console.
Use the same model and endpoint configuration to redeploy the endpoint.

This process is quicker than the initial deployment since the model artifacts are already stored in S3.

Alternative: Use SageMaker Model Registry (Optional)

If you plan to frequently stop and start the endpoint, you can register the model in the SageMaker Model Registry.
This allows you to quickly deploy and manage versions of your model without reconfiguring each time.

Check the cost:

AI Solutions Architect

1,791 位关注者

shimshon fischer

3 个月

Very informative ??

2 次回应

要查看或添加评论，请登录

Rany ElHousieny, PhD???的更多文章

Getting Started with LangChain.js: A Hello World Example

2025年2月18日

Getting Started with LangChain.js: A Hello World Example

LangChain.js is a powerful library that enables seamless interaction with Large Language Models (LLMs) in JavaScript…
LangChain Chains: Powering AI with Structured Execution ????

2025年2月16日

LangChain Chains: Powering AI with Structured Execution ????

When building AI-powered applications, we often need to process user inputs, format prompts, retrieve relevant data…
LangChain Memory in a React AI Joke Generator: A Beginner’s Guide ????

2025年2月16日

LangChain Memory in a React AI Joke Generator: A Beginner’s Guide ????

Wouldn’t it be cool if your AI remembered what it told you before? Imagine asking an AI for a joke, and instead of…
Mastering LangChain.js Prompt Templates: A Beginner's Guide for Frontend Developers

2025年2月16日

Mastering LangChain.js Prompt Templates: A Beginner's Guide for Frontend Developers

?? What if you could customize AI responses dynamically in your React app? Instead of sending hardcoded prompts to…
Getting Started with LangChain.js: Calling OpenAI to Tell a Joke

2025年2月15日

Getting Started with LangChain.js: Calling OpenAI to Tell a Joke

Artificial Intelligence is becoming more accessible for frontend developers, thanks to LangChain.js.
AI Development for Frontend Developers with React and LangChain: Hands-On project

2025年2月15日

AI Development for Frontend Developers with React and LangChain: Hands-On project

In my previous article, I explained how to build a Resume Coach application that helps job seekers optimize their…

3 条评论
Getting Started with OpenHands Code Assistance on Mac

2025年2月14日

Getting Started with OpenHands Code Assistance on Mac

OpenHands is an AI-powered code assistance tool designed to streamline development workflows. This guide will walk you…

1 条评论
CodiumAI Windsurf Code Assistant: Getting Started

2025年2月6日

CodiumAI Windsurf Code Assistant: Getting Started

In the ever-evolving landscape of software development, integrating advanced tools can significantly enhance…
Deploying DeepSeek-R1 on Azure

2025年2月6日

Deploying DeepSeek-R1 on Azure

DeepSeek-R1 is a powerful reasoning model designed for complex tasks like language processing, scientific reasoning…
Getting Started with LocalStack: A Beginner's Guide

2025年1月10日

Getting Started with LocalStack: A Beginner's Guide

LocalStack is an open-source tool that emulates AWS services locally, enabling you to develop and test your…

See all articles

Step 1: Accessing SageMaker JumpStart

1.1: Open SageMaker in AWS Console

1.2: Navigate to JumpStart

1.3: Understand Foundation Models

1.4: Find Meta’s Llama 3.2 Models

1.5: Choose a Model

The differences between Llama 3.2 1B and Llama 3.2 1B Instruct

1. Llama 3.2 1B (Base Model)

2. Llama 3.2 1B Instruct

Which Should You Choose?

Step 2: Deploying the Selected Model on SageMaker

Step 2.1: Understand the Deployment Process

Step 2.2: Configure the Deployment

Instance Type:

Step: Open the Deployment Notebook

Fill Out the Profile Information

Check the Role’s Policies

Step 2: Add Necessary Permissions

A. AmazonSageMakerFullAccess

B. S3 Permissions

领英推荐

Step 2: Complete the SageMaker Studio User Profile Setup

Step 5: Launch SageMaker Studio

Step 6: Deploy the Llama 3.2 1B Instruct Model in SageMaker Studio

Step 6.1: Open SageMaker JumpStart

Step 6.2: Configure the Model Deployment

Stopping the SageMaker Endpoint

Restarting the Endpoint Later

Alternative: Use SageMaker Model Registry (Optional)

AI Solutions Architect

1,791 位关注者

Rany ElHousieny, PhD???的更多文章

Getting Started with LangChain.js: A Hello World Example

LangChain Chains: Powering AI with Structured Execution ????

LangChain Memory in a React AI Joke Generator: A Beginner’s Guide ????

Mastering LangChain.js Prompt Templates: A Beginner's Guide for Frontend Developers

Getting Started with LangChain.js: Calling OpenAI to Tell a Joke

AI Development for Frontend Developers with React and LangChain: Hands-On project

Getting Started with OpenHands Code Assistance on Mac

CodiumAI Windsurf Code Assistant: Getting Started

Deploying DeepSeek-R1 on Azure

Getting Started with LocalStack: A Beginner's Guide

社区洞察

其他会员也浏览了

How to Install LLAMA 3 Simply on AWS via AMI

AWS Sagemaker helps equipment engineers prevent ‘Die Defects’ by predicting blade wear-out

Deploying a Trained CTGAN Model on an EC2 Instance: A Step-by-Step Guide

What is AWS Bedrock? AWS Bedrock Pricing Simplified

Building Understanding of AWS Infrastructure - Sample App Projects to Try

ML Pipelines on Google Cloud training

Leveraging AWS DeepLens for Computer Vision and AWS Glue for Data Integration

AWS Community Builders: How to Join the Program

MLOps Architectural view of MLOps on AWS

AWS Goodies - August 1, 2024