Explore the Power of Task-Specific Transformer Models with Amazon SageMaker and Hugging Face

Gary Stafford

Principal Solutions Architect @AWS | Data Analytics and Generative AI Specialist | Experienced Technology Leader, Consultant, CTO, COO, President | 10x AWS Certified

发布日期: 2024年3月19日

Explore the efficiencies of specialized transformer models available from Hugging Face to accomplish specific NLP tasks in Amazon SageMaker using real-time and batch inference

In this post, we’ll dive into the capabilities of specialized, task-specific transformer models and how they can tackle common natural language processing (NLP) and computer vision (CV) challenges. Using the power of Amazon SageMaker, we’ll perform real-time and batch inference with transformer models deployed from Hugging Face. Moreover, we’ll examine the advantages of hosting task-specific transformer models on Amazon SageMaker, and compare them to alternative solutions, including fully managed AI and Generative AI services offered on AWS.

Image generated on Amazon Bedrock with Amazon Titan Image Generator model

For this post, the term task-specific is used to describe specialized transformer models that are optimized to accomplish specific tasks, such as audio classification , speech emotion recognition (SER), monocular depth estimation , machine language translation , Segment Anything Model (SAM) for image masking, programming code completion , or one of my favorite, wine quality tabular classification ??.

Source Code

The source code used in this post’s demonstration is open-sourced and available on GitHub . I suggest starting with the project’s Jupyter Notebook , which contains all the examples in the post.

Transformers

According to NVIDIA , “A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence. Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence and depend on each other.”

According to Wikipedia , transformer models have had great success with natural language processing (NLP) and computer vision (CV) tasks such as:

Audio classification
Automatic speech recognition (ASR)
Document generation
Image classification
Image segmentation
Language modeling
Machine translation
Named entity recognition (NER)
Next-sentence prediction
Object detection
Paraphrasing
Question answering
Reading comprehension
Sentiment analysis
Summarization
Text classification
Writing computer code

Transformer models are often categorized by the tasks that they are designed to complete, such as text, vision, audio, video, and multimodal. Some common examples of transformers include:

BART (Bidirectional and Auto-regressive Transformer )
BERT (Bidirectional Encoder Representations from Transformer )
BLIP (Bootstrapping Language-Image Pre-training )
CLIP (Contrastive Language-Image Pretraining )
CLAP (Contrastive Language-Audio Pretraining )
GPT (Generative Pre-training Transformer )
T5 (Text-to-Text Transfer Transformer )
ViT (Vision Transformer )

Image generated on Runway AI (runwayml.com)

Getting Started with Hugging Face and SageMaker

For this demonstration, we will use an open-source transformer model available on Hugging Face , the well-known “platform where the machine learning community collaborates on models, datasets, and applications.”

List of over 550k models available on Hugging Face

We will host the Hugging Face transformer model for inference on Amazon SageMaker . AWS SageMaker allows users to “build, train, and deploy machine learning models for any use case with fully managed infrastructure, tools, and workflows.” Hugging Face has direct API integrations with Amazon SageMaker.

Machine Translations

For this demo, we will use the Helsinki-NLP/opus-mt-en-zh model . The Language Technology Research Group developed the transformer model at the University of Helsinki to perform English-to-Chinese machine translations. The research group has published over 1,440 models and eight datasets on Hugging Face. The opus-mt-en-zh model had over 85k downloads in February 2024, while its counterpart, the opus-mt-zh-en Chinese-to-English model had over 2.7M downloads in the same month!

Model Size

Based on the size of the pytorch_model.bin file, the opus-mt-en-zh model is 312 MB. Comparatively, typical large language models (LLMs) can range from 10’s to 100’s of GBs. For example, the 175 billion parameter GTP-3 requires 350 GB of disk space at 2 bytes/parameter.

Alternative to Open Source Task-Specific Models

Leading model builders like AI21 Labs offer task-specific models for distinct use cases as an alternative to open-source models. AI21’s models specialize in paraphrasing, grammatical error corrections (GEC), text improvement, summarization, text segmentation, contextual answers, semantic search, and embeddings. According to AI21 Labs , “AI21 Studio’s Task-Specific Models offer a range of powerful tools. These models have been specifically designed for their respective tasks and provide high-quality results while optimizing efficiency…As specialized models, each was optimized for a dedicated purpose, making it significantly more efficient than building it from scratch and much more cost-effective.”

Fully-managed Alternatives on?AWS

Amazon Translate

Alternatively, you may choose fully-managed AI services, such as Amazon Translate , instead of deploying the transformer model on Amazon SageMaker for machine translation. AWS describes Amazon Translate as “a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation.” Amazon Translate provides both real-time and batch translation capabilities.

Amazon Bedrock

Another option is using a general-purpose text-based Generative AI foundation model with Amazon Bedrock , described as the “easiest way to build and scale generative AI applications with foundation models.” Amazon Bedrock offers real-time inference and, most recently, batch inference, allowing you to run multiple inference requests asynchronously.

The challenge is finding a general-purpose generative text model that can consistently and accurately accomplish your specific NLP or CV task, such as translating English to Chinese. Below is an example of using the latest Anthropic Claude 3 Haiku model on Amazon Bedrock Text Playground. The translation was successful, but the quality of the results were mixed.

Performing translation using the Claude 3 Haiku model on the Amazon Bedrock Playground

Below is another example using the Mistral AI’s Mixtral 8x7B Instruct model. The machine translation results were less accurate than those of other foundation models tested. Further, note the extraneous translations to additional languages, which were not requested, including German, Italian, Japanese, Korean, Russian, and Spanish; this adds cost and time.

Performing translation using Mixtral 8x7B Instruct model on the Amazon Bedrock Playground

A second attempt with Mistral AI’s Mixtral 8x7B Instruct model with a lower temperature also gave bizarre results, translating my request, plus providing several more translations of other English texts. Again, this inconsistency and added response adds cost and time.

These examples demonstrate the trade-offs of relying on a general-purpose foundation model for specialized tasks. The pros and cons of using one method or model over the other comes down to a few important considerations:

Consistent accuracy of inference results across the entire workload
Consistent results (format of response) across the entire workload
Total cost of inference based on your workload
Total time of inference based on your requirements
Ease of use based on your technical abilities

Fully managed AI services often excel at ease of use and lower cost for tasks with lower volumes (smaller datasets). However, you may find that with very high volumes (large datasets), deploying task-specific models is more cost-efficient and provides more flexibility at scale than fully managed services. Select the right tool for the job.

Service Quotas

Before starting the demonstration, based on your budget, ensure you have 1–2 instances available for real-time inference and batch transforms. In this post, I have arbitrarily used a mix of ml.p3.2xlarge, ml.g5.12xlarge, ml.g4dn.2xlarge, and ml.g4dn.8xlarge GPU-based instances for inference. You can use Service Quotas in the AWS Management Console to check your available instance types and request additional instances if necessary.

SageMaker instance service limits for batch transform jobs in the Service Quotas console

Dataset

We will use the Quotes?—?500k dataset, available on Kaggle, for the demonstration. The dataset contains 500,000 quotes from well-known authors. There is no need to download the entire dataset. Due to its poor quality, the data required considerable cleansing to ensure we could perform inference without issues. I have included a clean set of 10,000 quotes for the demonstration in the GitHub project.

Amazon SageMaker Studio

All code used for this demo is contained in a Jupyter Notebook, built and managed in Amazon SageMaker Studio , the latest web-based experience for running ML workflows on AWS. According to AWS , Studio offers a suite of integrated development environments (IDEs). These include Code Editor, based on Code-OSS, Visual Studio Code?—?Open Source, a new JupyterLab application, RStudio, and Amazon SageMaker Studio Classic.

Getting Started

Using the supplied Jupyter Notebook , first install or update the required Python package for your environment.

%%sh

python3 -m pip install sagemaker boto3 botocore jsonlines -Uq

Next, deploy the Hugging Face-based transformer model as an Amazon SageMaker real-time inference endpoint. AWS states, “Real-time inference is ideal for inference workloads with real-time, interactive, low-latency requirements. You can deploy your model to SageMaker hosting services and get an endpoint that can be used for inference.”

Given the close integration of Hugging Face with SageMaker, we can pull a copy of the model artifacts and deploy them to a real-time inference endpoint with only a few lines of boilerplate code using the HuggingFaceModel class’s deploy() method. Below is an example of deploying the model to a single ml.g5.12xlarge instance for real-time inference.

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

# hugging face model
HF_MODEL_ID = "Helsinki-NLP/opus-mt-en-zh"

try:
    role = sagemaker.get_execution_role()
except ValueError:
    client_iam = boto3.client("iam")
    role = client_iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

# hub model configuration. https://huggingface.co/models
hub = {
  "HF_MODEL_ID": HF_MODEL_ID, 
  "HF_TASK": "translation"
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version="4.37.0",
    pytorch_version="2.1.0",
    py_version="py310",
    env=hub,
    role=role,
)

# 1x to deploy the model
predictor = huggingface_model.deploy(
    initial_instance_count=1,  # number of instances
    instance_type="ml.g5.12xlarge",  # ec2 instance type
)

Once complete, the model’s real-time inference endpoint appears in the SageMaker Studio Deployment’s > Endpoints tab. IMPORTANT: this endpoint will persist and you will continue to pay for it until you delete it.

Jean KO?VOGUI 6 个月前

How to Become a Master in Large Language Models (LLMs)

Sandhya Karki 3 个月前

Unraveling the Magic of Transformers in NLP

HirePort AI 1 年前

Transformer model deployed to a SageMaker real-time inference endpoint

You will need the name of the model endpoint to perform inference. The name, for example, huggingface-pytorch-inference-2024–03–18–00–22–55–664, can be obtained in the Endpoints > Details tab (shown above) or from within the notebook by running the following command:

# output contains endpoint name
predictor.endpoint_context()

Using the predict() method, we can test the deployed model’s real-time inference endpoint:

predictor.predict(
    {
        "inputs": "A heart filled with anger has no room for love.",
    }
)

You should see results similar to the following:

[{'translation_text': '充满愤怒的心没有爱的空间'}]

You can validate the accuracy of the translation results using several methods, including Google Translate. It is important to note that this post is focused on how to use the models, not on the choice of models or their performance. Lacking proficiency in the Chinese language, I cannot recommend this model over other similar models for machine translation.

Checking translation results with Google Translate

Using the existing real-time endpoint for inference, we can use the predict() method. You should get the same results as the previous inference method.

from sagemaker.huggingface.model import HuggingFacePredictor

SAGEMAKER_ENDPOINT = "<your_endpoint_name>"

session = sagemaker.session.Session()

predictor = HuggingFacePredictor(
    endpoint_name=SAGEMAKER_ENDPOINT, sagemaker_session=session
)

predictor.predict(
    {
        "inputs": "A heart filled with anger has no room for love",
        "parameters": {"max_length": 1024, "min_length": 1},
    }
)

Using SageMaker Runtime API for Real-time Inference

As an alternative to the HuggingFacePredictor Class, we can use the Amazon SageMaker runtime API, SageMakerRuntime, calling the SageMakerRuntime.Client Class’s invoke_endpoint() method to perform real-time inference.

import boto3

client_smr = boto3.client("sagemaker-runtime")

# reference: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime/client/invoke_endpoint.html
response = client_smr.invoke_endpoint(
    EndpointName=SAGEMAKER_ENDPOINT,
    Body=bytes(
        '{"inputs": "A heart filled with anger has no room for love."}', 
        "utf-8"
    ),
    ContentType="application/json",
)

# decodes and prints the response body:
print(response["Body"].read().decode("utf-8"))

You should get the same results as the previous two inference methods.

[{"translation_text":"充满愤怒的心没有爱的空间"}]

Bulk Inference with Real-time Inference Endpoint

A real-time inference endpoint is suitable for one-off translations or exposing it as part of a customer-facing translation application. Although not optimal, it can also be adopted for low-volume bulk translations. To demonstrate this, I imported the cleansed dataset from the project’s CSV and stored a subset of quotes in a series of Python list data types.

import csv

file = open("./_prelims/quotes_10k_clean.csv", "r")
data = list(csv.reader(file, delimiter=","))
file.close()

quotes = []
for row in enumerate(data):
    # skip longer quotes ("model_max_length": 512 tokens)
    if len(row[0]) > 1024:
        continue
    quotes.append(row)

# create lists of varying lengths of quotes for testing
quotes_10 = [column[0] for column in quotes[1:11]]
quotes_100 = [column[0] for column in quotes[1:101]]
quotes_1k = [column[0] for column in quotes[1:1001]]
quotes_10k = [column[0] for column in quotes[1:10001]]

You can then translate each quote by iterating over the list, calling the real-time endpoint, and writing the results back to an in-memory Python list of dictionaries, which could later be written to Amazon S3. Below, we see an example of iterating over a list of 1,000 quotes.

import boto3

client_smr = boto3.client("sagemaker-runtime")

translations = [] # holds the translations

for idx, quote in enumerate(quotes_1k):
    try:
        json = f'"inputs": "{quote}"'
        json = "{" + json + "}"
        response = client_smr.invoke_endpoint(
            EndpointName=SAGEMAKER_ENDPOINT,
            Body=bytes(json, "utf-8"),
            ContentType="application/json",
        )
        response_str = response["Body"].read().decode("utf-8")
        response_dict = eval(response_str)
        translation_text = response_dict[0]["translation_text"]
        translations.append({"input": quote, "output": translation_text})
    except client_smr.exceptions.ModelError as e:
        print(e)

    print(f"Translating quote: {idx}/1000", end="\r")

CPU times: user 3.04 s, sys: 222 ms, total: 3.26 s
Wall time: 6min 20s

The output will look similar to the following:

[
    {
        "input": "A friend is someone who knows all about you and still loves you.",
        "output": "朋友是了解你的一切 仍然爱你的人"
    },
    {
        "input": "It is better to be hated for what you are than to be loved for what you are not.",
        "output": "更好地被憎恨你是什么 比被爱 不是你是什么。"
    },
    {
        "input": "Love all, trust a few, do wrong to none.",
        "output": "爱,信任少数人,对任何人做错事"
    },
    {
        "input": "You love me. Real or not real? I tell him, Real.",
        "output": "你爱我,真的还是不是真的?"
    },
    {
        "input": "Love is like the wind, you can't see it but you can feel it.",
        "output": "爱就像风,你看不到它,但你能感觉到它。"
    }
]

Inference Results

In my tests, using (1) ml.g5.2xlarge instance with (1) NVIDIA A10G GPU and 24 GiB of GPU memory, 1,000 translations took an average of 6min 20s, or about 2.63 transactions/second (0.38 s/t). Using (1) ml.g5.12xlarge instance with (4) NVIDIA A10G GPUs and 96 GiB of GPU memory, 1,000 translations took an average of 4min 25s, or about 3.77 transactions/second (0.265 s/t). Not bad for a series of sequential inference endpoint invocations with no parallelization. Further evaluation could be done to optimize the model’s performance while maintaining or reducing the inference costs. This does not include I/O time for Amazon S3 to store translation results.

Amazon SageMaker Batch Transform

In contrast to real-time inference, batch transform allows us to get inferences from large datasets and run inference when we don’t need a persistent endpoint. With batch transform, SageMaker handles initializing compute instances and distributing the inference workload between them.

I have written the quotes to JSON Lines format files in order to prepare the data for batch transform. I found JSON Lines easier to work with than CSV for batch transform jobs using the quotes.

{"input": "A friend is someone who knows all about you and still loves you."}
{"input": "It is better to be hated for what you are than to be loved for what you are not."}
{"inputs": "Love all, trust a few, do wrong to none."}
{"inputs": "You love me. Real or not real? I tell him, Real."}
{"inputs": "Love is like the wind, you can't see it but you can feel it."}

Since SageMaker’s batch transform can distribute the inference workloads across compute instances, I broke the list of 10,000 quotes into four JSON Lines files, each containing 2,500 quotes. You don’t need to create the files yourself (as demonstrated below); they are part of the GitHub project.

import jsonlines

filename = "./10k_quotes/quotes_10k_1.jsonl"

items = []

for quote in quotes_10k[0:2500]:
    items.append({"inputs": quote})

with jsonlines.open(filename, "w") as writer:
    writer.write_all(items)

To prepare for batch transforms, copy the JSON Lines files from your local copy of the GitHub project into your Amazon S3 bucket.

from sagemaker.s3 import S3Uploader, s3_path_join

files = [
    "quotes/quotes_10.jsonl",
    "quotes/quotes_100.jsonl",
    "quotes/quotes_1k.jsonl",
    "quotes/quotes_10k.jsonl",
]

for file in files:
    input_s3_path = s3_path_join("s3://", S3_BUCKET, "input_batch", "quotes")
    s3_file_uri = S3Uploader.upload(file, input_s3_path)
    print(f"{file} uploaded to {s3_file_uri}")

files = [
    "10k_quotes/quotes_10k_1.jsonl",
    "10k_quotes/quotes_10k_2.jsonl",
    "10k_quotes/quotes_10k_3.jsonl",
    "10k_quotes/quotes_10k_4.jsonl",
]

for file in files:
    input_s3_path = s3_path_join("s3://", S3_BUCKET, "input_batch", "10k_quotes")
    s3_file_uri = S3Uploader.upload(file, input_s3_path)
    print(f"{file} uploaded to {s3_file_uri}")

Next, start a batch transform job. Hugging Face has documentation on using this method for batch transform jobs. The following batch transform job will use (2) ml.g4dn.8xlarge instances to process the (4) JSON Lines files, each containing 2,500 quotes.

%%time

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

try:
    role = sagemaker.get_execution_role()
except ValueError:
    client_iam = boto3.client("iam")
    role = client_iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

# hub model configuration
hub = {
    "HF_MODEL_ID": HF_MODEL_ID,
    "HF_TASK": "translation"
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version="4.37.0",
    pytorch_version="2.1.0",
    py_version="py310",
    env=hub,
    role=role,
)

output_s3_path = f"s3://{S3_BUCKET}/output_batch"
s3_data_input = f"s3://{S3_BUCKET}/input_batch/10k_quotes/"

# starts batch transform job and uses S3 data as input
batch_job = huggingface_model.transformer(
    accept="application/json",
    assemble_with="Line",
    instance_count=2,
    instance_type="ml.g4dn.8xlarge",
    output_path=output_s3_path,
    strategy="SingleRecord",
)

batch_job.transform(
    content_type="application/json",
    data=s3_data_input,
    split_type="Line",
    logs=False,
)

Batch Transform Results

Using (2) ml.g4dn.8xlarge instances, costing $2.72/hr., to process (4) JSON Lines files, each containing 2,500 quotes, for a total of 10k quotes, with a SingleRecord strategy, the ~34-minute long job achieved 5.55 translations/second. This instance type uses (1) NVIDIA T4 GPU with 16 GB of GPU memory.

Using (4) smaller ml.g4dn.2xlarge instances, costing just $0.94/hr., to process (8) JSON Lines files, each containing 2,500 quotes, for a total of 20k quotes, the ~32-minute long job achieved 10.41 translations/second. This instance type also uses (1) NVIDIA T4 GPU with 16 GB of GPU memory but with one-quarter of the vCPUs and memory as the ml.g4dn.8xlarge instance.

Lastly, again using (4) smaller ml.g4dn.2xlarge instances to process (20) JSON Lines files, each containing 2,500 quotes, for a total of 50k quotes, the ~73-minute long job achieved nearly identical results of 10.07 translations/second.

Batch transform jobs tab in the Amazon SageMaker console

Detailed view of a SageMaker batch transform job

The batch transform jobs write the JSON Lines output to the same Amazon S3 bucket. One JSON Lines output file will be created for each input file.

Output from the batch transform job in Amazon S3

The translation results should look as follows:

[{"translation_text":"朋友是了解你的一切 仍然爱你的人"}]
[{"translation_text":"我们接受我们认为我们应得的爱。"}]
[{"translation_text":"爱,信任少数人,对任何人做错事"}]
[{"translation_text":"被某人深爱 给了你力量 而爱一个人深爱 给了你勇气"}]
[{"translation_text":"爱就像风,你看不到它,但你能感觉到它。"}]

As an alternative to using the HuggingFaceModel Class, we can use the Amazon SageMaker runtime API, SageMaker, calling the SageMaker.Client class’s create_transform_job() method to perform a batch transform. The parameters are nearly identical between the two methods.

import sagemaker
import boto3
import time

session = sagemaker.session.Session()
client_sm = boto3.client("sagemaker")

try:
    role = sagemaker.get_execution_role()
except ValueError:
    client_iam = boto3.client("iam")
    role = client_iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

output_s3_path = f"s3://{S3_BUCKET}/output_batch"
s3_data_input = f"s3://{S3_BUCKET}/input_batch/10k_quotes/"
model_name = "<your_deployed_model_name>"
batch_job_name = f"quotes-batch-{int(time.time())}-10k"

# lauch batch transform job
response = client_sm.create_transform_job(
    TransformJobName=batch_job_name,
    ModelName=model_name,
    BatchStrategy="SingleRecord",
    TransformInput={
        "DataSource": {
            "S3DataSource": {
                "S3DataType": "S3Prefix",
                "S3Uri": s3_data_input,
            }
        },
        "ContentType": "application/json",
        "SplitType": "Line",
    },
    TransformOutput={
        "S3OutputPath": output_s3_path,
        "AssembleWith": "Line",
        "Accept": "application/json",
    },
    TransformResources={
        "InstanceType": "ml.g4dn.8xlarge",
        "InstanceCount": 2,
    },
)

print(response["TransformJobArn"])

The model_name variable is the model that will be deployed for the batch transform job. You can use the model you deployed to the real-time inference endpoint earlier in the demonstration. Models can be found in the SageMaker console in the inference> Models tab.

Find a model to use for the batch transform job in the inference> Models tab

If you want to track the progress of your batch transform job, I use this helpful routine, found in the AWS SageMaker’s samples on GitHub, pytorch_flores_batch_transform :

%%time

while True:
    response = client_sm.describe_transform_job(
        TransformJobName=batch_job_name
    )
    status = response["TransformJobStatus"]
    if status == "Completed":
        print(f"Transform job ended with status: {status}")
        break
    if status == "Failed":
        message = response["FailureReason"]
        print("Transform failed with the following error: {}".format(message))
        raise Exception("Transform job failed")
    print(f"Transform job is still in status: {status}...", end="\r")
    time.sleep(30)

Below, using Amazon CloudWatch, you can observe the GPU and GPU Memory Utilization for the (2) ml.g4dn.8xlarge instances across the run time of the 10,000 quote batch transform job.

GPU and GPU memory utilization of two instances for SageMaker batch transform jobs

Here are similar GPU metrics for a batch transform job using (4) ml.g4dn.2xlarge instances across the run time of the 50,000 quote batch transform job. These metrics can be used to optimize the batch size, file count, and instance type, as well as count for both performance and cost.

GPU and GPU memory utilization of four instances for SageMaker batch transform jobs

Delete Real-time Inference Endpoint

Important: Don't forget to delete your real-time inference endpoint(s) or you will continue to be charged hourly for each instance. Optionally, you can delete the associated endpoint configuration(s) and the model(s).

import boto3

sagemaker_client = boto3.client("sagemaker")

# delete endpoint
sagemaker_client.delete_endpoint(EndpointName=SAGEMAKER_ENDPOINT)

# delete endpoint configuration
endpoint_config_name="<endpoint_name>"
sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)

# delete model
model_name = "<model_name>"
sagemaker_client.delete_model(ModelName=model_name)

Conclusion

In this post, we explored how to utilize the capabilities of specialized task-specific transformer models to accomplish common NLP tasks. We also learned to use Amazon SageMaker to deploy Hugging Face transformer models for real-time and batch transform (batch inference). Lastly, we compared this method with fully managed AI and Generative AI services on AWS.

This blog represents my viewpoints and not those of my employer, Amazon Web Services (AWS). All product names, images, logos, and brands are the property of their respective owners.

Tim Condello

Director of Cloud & AI/ML | Marine Corps Veteran | ex-AWS | Advisor

7 个月

Gary Stafford great blog I always learn something. Question, have you seen issues with licenses of open source models?

Gary Stafford

Principal Solutions Architect @AWS | Data Analytics and Generative AI Specialist | Experienced Technology Leader, Consultant, CTO, COO, President | 10x AWS Certified

7 个月

All code available on GitHub:

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Explore the Power of Task-Specific Transformer Models with Amazon SageMaker and Hugging Face

Gary Stafford

Principal Solutions Architect @AWS | Data Analytics and Generative AI Specialist | Experienced Technology Leader, Consultant, CTO, COO, President | 10x AWS Certified

Explore the efficiencies of specialized transformer models available from Hugging Face to accomplish specific NLP tasks in Amazon SageMaker using real-time and batch inference

Source Code

Transformers

Getting Started with Hugging Face and SageMaker

Machine Translations

Model Size

Alternative to Open Source Task-Specific Models

Fully-managed Alternatives on?AWS

Amazon Translate

Amazon Bedrock

Service Quotas

Dataset

Amazon SageMaker Studio

Getting Started

领英推荐

Using SageMaker Runtime API for Real-time Inference

Bulk Inference with Real-time Inference Endpoint

Inference Results

Amazon SageMaker Batch Transform

Batch Transform Results

Delete Real-time Inference Endpoint

Conclusion

更多精彩文章

社区洞察

其他会员也浏览了

Generative AI: Types, Skills, Opportunities and Challenges

Future of AI : The Rise of Small Language Models.

Navigating the Generative AI Landscape

Generative AI: The Science Behind Large Language Models - Simplified

Understanding Transformers: A Deep Dive with PyTorch

Fine-Tuning Large Language Models (LLMs) with Transfer Learning in a Spring Data Pipeline:

Explainable AI: Language Models

Advanced AI Terminologies and Concepts for Professionals

Enterprise Preparation Guide for Using LLMs

Large Language Models - How are the OpenAI GPT models trained?

Explore the efficiencies of specialized transformer models available from Hugging Face to accomplish specific NLP tasks in Amazon SageMaker using real-time and batch inference

Source Code

Transformers

Getting Started with Hugging Face and SageMaker

Machine Translations

Model Size

Alternative to Open Source Task-Specific Models

Fully-managed Alternatives on?AWS

Amazon Translate

Amazon Bedrock

Service Quotas

Dataset

Amazon SageMaker Studio

Getting Started

领英推荐

Using SageMaker Runtime API for Real-time Inference

Bulk Inference with Real-time Inference Endpoint

Inference Results

Amazon SageMaker Batch Transform

Batch Transform Results

Delete Real-time Inference Endpoint

Conclusion

Quantitative and Qualitative Image Analysis Using Nine Different Multimodal Generative AI Vision Models

2024年10月24日

Multilingual Vision Captioning: A Multi-Model Multimodal Approach to Image and Video Captioning and Translation

2024年10月8日

Local Inference with Meta’s Latest Llama 3.2 LLMs Using Ollama, LangChain, and Streamlit

2024年9月27日

AI-Powered Product Perfection — Part 2 of 2: Leveraging Generative AI Techniques for Diverse, High-Fidelity Product Shot Variations

2024年9月3日

AI-Powered Product Perfection - Part 1 of?2: Leveraging Generative AI Techniques for Diverse, High-Fidelity Product Shot Variations

2024年8月28日

Automating Fine-tuning Dataset Creation using Multimodal Generative AI Models

2024年7月31日

3D2I: Create Stunning AI-generated Images and Product Concepts from 3D Models on AWS with Amazon?Bedrock

2024年7月1日

Generative AI Videos with Stability AI’s Stable Video Diffusion XT using Asynchronous Inference on Amazon SageMaker

2024年4月23日

Multimodal Advertising Analysis and Creative Content Generation using Anthropic Claude 3 on Amazon Bedrock

2024年4月15日

Creating Short Video Animations using Amazon Bedrock, Stability AI, Photoshop, and RunwayML

2024年3月30日

社区洞察

其他会员也浏览了

Generative AI: Types, Skills, Opportunities and Challenges

Future of AI : The Rise of Small Language Models.

Navigating the Generative AI Landscape

Generative AI: The Science Behind Large Language Models - Simplified

Understanding Transformers: A Deep Dive with PyTorch

Fine-Tuning Large Language Models (LLMs) with Transfer Learning in a Spring Data Pipeline:

Explainable AI: Language Models

Advanced AI Terminologies and Concepts for Professionals

Enterprise Preparation Guide for Using LLMs

Large Language Models - How are the OpenAI GPT models trained?