Deploying an LLM Using Amazon SageMaker JumpStart: A Step-by-Step Guide

Deploying an LLM Using Amazon SageMaker JumpStart: A Step-by-Step Guide


Deploying a large language model (LLM) on AWS SageMaker can be an intimidating task for beginners, but with Amazon SageMaker JumpStart, the process becomes much easier. In this guide, we'll walk through the steps to set up SageMaker, select a foundation model, and prepare it for deployment.


Step 1: Accessing SageMaker JumpStart

SageMaker JumpStart provides a curated collection of pre-built machine learning models and solutions, including cutting-edge foundation models such as Llama 3.2. Follow these steps to get started.

1.1: Open SageMaker in AWS Console

  1. Log in to your AWS Management Console.
  2. In the Services menu, search for SageMaker and click on it. You’ll be taken to the SageMaker landing page, where all its services and configurations are accessible.


1.2: Navigate to JumpStart

  1. In the left-hand menu of the SageMaker console, locate the JumpStart section.Under JumpStart, select Foundation Models to explore pre-trained models.This page displays a list of pre-built foundation models, including text generation models, computer vision models, and more.

1.3: Understand Foundation Models

Before proceeding, it’s essential to understand what foundation models are:

  • Foundation models are pre-trained on massive datasets, enabling them to perform a wide range of tasks such as text generation, summarization, and question answering.
  • SageMaker JumpStart provides ready-to-deploy versions of these models, saving you the hassle of training them from scratch.

1.4: Find Meta’s Llama 3.2 Models

On the Foundation Models page:

  1. Scroll down or use the search bar to locate models provided by Meta.
  2. You’ll see two options for the Meta Llama 3.2 1B model:Meta Llama 3.2 1B: The base version of the model, designed for generic text generation tasks.Meta Llama 3.2 1B Instruct: An instruction-tuned version of the model, optimized for following human-like instructions.Note: The instruction-tuned version is more suited for tasks like question-answering or chatbot applications, while the base version is ideal for open-ended text generation.

1.5: Choose a Model

  1. Click on the View Model button under the version you wish to explore.For this guide, we recommend selecting Meta Llama 3.2 1B Instruct to simplify experimentation.
  2. This will take you to a detailed page about the model, including information about its architecture, supported tasks, and deployment configurations.


By completing these steps, you’ve set up the foundation to deploy a pre-trained LLM using SageMaker JumpStart. In the next step, we’ll walk through configuring and deploying the model to create a real-time endpoint for experimentation.

The differences between Llama 3.2 1B and Llama 3.2 1B Instruct


1. Llama 3.2 1B (Base Model)

  • Type: General-purpose large language model.
  • Training: Trained on large-scale datasets to generate coherent and meaningful text based on prompts.
  • Use Case: Suitable for:Generic text generation tasks (e.g., creative writing, summarization).Applications where the input isn't task-specific or instruction-based.
  • Limitation:May not follow detailed instructions well.Needs more fine-tuned prompts to achieve desired results.


2. Llama 3.2 1B Instruct

  • Type: Instruction-tuned large language model.
  • Training:Built on top of the base Llama 3.2 1B model.Fine-tuned with datasets containing human-like instructions and corresponding outputs.
  • Use Case: Designed for:

Answering questions.

Following specific instructions more accurately.

Use cases requiring structured responses, like chatbot development or task completion.

  • Advantage:

Better at understanding and executing specific instructions or conversational queries.

More suitable for "instruction-response" scenarios, such as customer support, code generation, or educational tools.



Which Should You Choose?

Choose Llama 3.2 1B if:

  • You’re exploring generic capabilities of LLMs.
  • You want flexibility to experiment with creative or open-ended text generation.

Choose Llama 3.2 1B Instruct if:

  • You need the model to follow specific instructions or act as a chatbot.
  • You’re developing an interactive application that requires concise, structured answers.

Since we are going to use this model for instructions, I will select the instruction model.



Step 2: Deploying the Selected Model on SageMaker

Now that you’ve selected your desired foundation model (e.g., Meta Llama 3.2 1B Instruct), it’s time to configure and deploy the model to create an endpoint. This endpoint allows you to interact with the model programmatically through API calls.


Step 2.1: Understand the Deployment Process

Before deploying, here’s an overview of what happens during this step:

  1. SageMaker provisions compute resources (e.g., CPU or GPU instances) to host the model.
  2. The selected model is loaded onto these resources.
  3. An endpoint is created, which serves as an API for interacting with the model in real time.


Step 2.2: Configure the Deployment

Instance Type:

  • Choose a compute instance based on your workload and budget.
  • For a lightweight experiment, you can use:

ml.t2.medium: Cheapest option (CPU-only, limited performance).

ml.m5.large: More memory but still CPU-only.

  • For better performance (GPU-accelerated):

ml.g4dn.xlarge: Affordable GPU option for LLMs.

ml.g5.xlarge: Powerful GPU instance for faster inference.

2 - Number of Instances:

  • For experiments, set the instance count to 1 to keep costs low.

3 - Default Configuration:

  • Use the default configurations provided by SageMaker JumpStart if you're unsure about customization. These include memory allocation, environment variables, and other model-specific settings.


You should be on the Model Details page from the previous step:



Step: Open the Deployment Notebook

1 - Click "Open notebook in Studio":


Select a Domain







You're currently in the process of creating a SageMaker Studio user profile, which is necessary to use SageMaker Studio. Here’s what you need to do step by step on the screen above:


Fill Out the Profile Information

1 - Name:

  • Enter a descriptive name for the profile.
  • For example: Rany-SageMaker-1 (you can leave it as is).

2 - Execution Role:

  • Select the AmazonSageMaker-ExecutionRole from the dropdown.


  • Ensure this role has sufficient permissions: It should have AmazonSageMakerFullAccess. It also needs access to S3 if you'll upload/download files.
  • Tags (Optional):
  • Click "Next":



Check the Role’s Policies

1 - Go to the AWS IAM Console:


2 - Find the Role:

  • In the left-hand menu, click on Roles.

  • Search for the role you selected earlier (e.g., AmazonSageMaker-ExecutionRole-20241204T075783).


3 - View Attached Policies:

  • Click on the role to open its details.
  • Scroll down to the Permissions tab to see the policies attached to the role.


Step 2: Add Necessary Permissions

A. AmazonSageMakerFullAccess

  1. If AmazonSageMakerFullAccess is not listed:Click Attach Policies.Search for AmazonSageMakerFullAccess.Select the checkbox next to it and click Attach Policy.

B. S3 Permissions

  1. Ensure there’s a policy granting access to S3. For example:

  • AmazonS3FullAccess (provides unrestricted access to all S3 buckets).






Now that your SageMaker execution role has the necessary permissions, let’s proceed to the next step: setting up SageMaker Studio.


Step 2: Complete the SageMaker Studio User Profile Setup

  1. In the SageMaker Console:

If you haven’t already clicked Next, proceed to the next screen in the user profile creation wizard.

2 - Step 2: Configure Applications:

  • Leave the default options as is unless you have specific application configurations in mind.
  • Click Next.



3 - Step 3: Customize Studio UI:

  • Optional: Customize the SageMaker Studio interface if desired (e.g., enable/disable features).
  • If unsure, leave the default settings and click Next.


4: Data and Storage Settings:

  • Review the storage settings.
  • AutoMountHomeEFS: Leave the Inherit settings from domain checkbox selected. This ensures that SageMaker Studio will automatically handle EFS storage for your user profile.
  • CustomPosixUserConfig: Leave this unchecked, unless you have a specific need to set custom POSIX configurations (most use cases don’t require this).
  • Click "Next":



5: Review and Create:

  • Review the configuration and ensure everything looks correct.
  • Click Submit to finalize the setup.




Step 5: Launch SageMaker Studio

1 : Open SageMaker Studio:

  • Go to the SageMaker Console.
  • Navigate to SageMaker Studio in the left-hand menu under the Control Panel section.
  • Find your user profile (e.g., Rany-SageMaker-1) and click Open Studio > Studio.


2: Wait for Studio to Launch:

  • SageMaker Studio will open in a new tab. This might take a minute or two if it's the first time launching.
  • You should now see the SageMaker Studio interface, which includes options for notebooks, data processing, and model management.



Step 6: Deploy the Llama 3.2 1B Instruct Model in SageMaker Studio

Step 6.1: Open SageMaker JumpStart

1 - In the SageMaker Studio interface:

  • Look for the JumpStart option in the left-hand navigation pane (At the end after Models).


  • Click JumpStart to open the available foundation models and pre-built solutions.


2 - Search for the Meta Llama 3.2 1B Instruct model:


  • Use the search bar at the top of JumpStart and type "Llama 3.2".
  • Locate Llama 3.2 1B Instruct in the search results and click on it.

Note: You may request access if you do not have access:




Step 6.2: Configure the Model Deployment

1 - On the model details page:

  • Confirm that the model is self-deployable.
  • Look for the Deploy button (if available) or the Open Notebook in Studio option.


2 - Select the deployment method:

If there’s a Deploy button:

  • Click Deploy, select an appropriate instance type (e.g., ml.g5.xlarge or a smaller instance if on a budget), and proceed.





SageMaker endpoint for Llama 3.2 1B Instruct is now being created. This is the final part of the deployment process, and we’ll move on once the status changes from Creating to InService.





We successfully tested the Llama 3.2 1B Instruct model endpoint, and it returned a response to your test input: "Tell me a joke?". The model generated a humorous reply, which confirms that the endpoint is working as expected. ??



You can shut down the SageMaker endpoint to save costs and restart it later when needed. SageMaker endpoints incur costs as long as they are running, so stopping them is a great way to save money when they're not in use.

Here’s how to manage your endpoint:


Stopping the SageMaker Endpoint

1 - Go to the SageMaker Console:


2 - Find Your Endpoint:

  • Locate the endpoint you created (e.g., jumpstart-dft-llama-3-2-1b-instruct...).
  • Check the Status column to confirm it is currently InService.

3 - Delete the Endpoint:

  • Select the endpoint and click Delete.
  • Confirm the deletion. This will stop the instance(s) running behind the endpoint.

Note: Deleting the endpoint does not delete the model or configuration, so you can redeploy it later without having to set up everything from scratch.






Restarting the Endpoint Later

Recreate the Endpoint:

  1. Go to SageMaker Studio or the SageMaker Console.
  2. Use the same model and endpoint configuration to redeploy the endpoint.

This process is quicker than the initial deployment since the model artifacts are already stored in S3.


Alternative: Use SageMaker Model Registry (Optional)

  • If you plan to frequently stop and start the endpoint, you can register the model in the SageMaker Model Registry.
  • This allows you to quickly deploy and manage versions of your model without reconfiguring each time.

Check the cost:



要查看或添加评论,请登录

Rany ElHousieny, PhD???的更多文章

社区洞察

其他会员也浏览了