登录查看更多内容

A Local Task with Hugging Face's Meta-Llama 3.1-8B-Instruct Model to Mask Sensitive Data

Guilin Zhang

Senior Software Engineer at Workday | Active Security Clearance

发布日期: 2024年7月26日

Introduction

This demo is to explore the use of Hugging Face's Meta-Llama 3.1-8B-Instruct model for a text processing task focused on desensitizing data. The goal was to create a Python job that locally uses this model to replace sensitive information, such as AWS EC2 instance IDs, with 'XXX' while retaining the rest of the content.

Steps

1. Requesting Access to the Model

Requeste access to the model on Hugging Face's platform.

2. Creating an Access Token

Once access was granted, create an access token from Hugging Face profile. This token is essential for authenticating API calls to Hugging Face services.

3. Setting Up the Python Environment

On local (in my case, I used Mac), set up a Python virtual environment using the following commands

python -m venv .env

activate it

source .env/bin/activate

4. Install the necessary libs

pip install --upgrade transformers accelerate huggingface_hub

5. Logging into Hugging Face

Use the huggingface_hub library to log into Hugging Face

Patrick Nicolas 5 个月前

Getting started with PySpark on Google Colab

Eduardo Miranda 3 个月前

Neo4j Graph Tech Weekly (Edition:7)

Neo4j 2 年前

from huggingface_hub import login
login()

Run it to open the huggingface_hub, providing the access token when prompted.

6. Defining the Prompt

Craft a system and user message to instruct the model on the task of desensitizing data. The prompt was tokenized, and a special attention mask was created to handle input properly.

messages = [
    {"role": "system", "content": (
        "You are a natural language processor. "
        "Please replace the AWS EC2 instance id with X, "
        "and output the rest of the information as it is. "
        "For example, change [ec2-285dct67i5 is in our cloud] to [ec2-XXX is in our cloud]"
    )},
    {"role": "user", "content": (
        "The down AWS EC2 instance id is ec2-01845dct67i, please page the on-call engineer."
    )},
]

7. Loading the Model and Tokenizer

Load the tokenizer and model using the transformers library by running the Python Job. The model was downloaded and set up with the appropriate configuration

8. Generating the Output

Generate the output using the model's generate function, setting parameters like max_new_tokens, temperature, and top_p to control the generation process.

inputs = inputs.to(model.device)
attention_mask = attention_mask.to(model.device)

eos_token_id = tokenizer.eos_token_id
if eos_token_id is None:
    eos_token_id = tokenizer.convert_tokens_to_ids("")

outputs = model.generate(
    inputs,
    attention_mask=attention_mask,
    max_new_tokens=256,
    eos_token_id=eos_token_id,
    do_sample=True,
    temperature=0.5,
    top_p=0.9,
)

9. Extracting and Decoding the Response

The generated response was extracted and decoded to present the final output.

response_text = tokenizer.decode(response_ids, skip_special_tokens=True)

Showcasing the model's ability to desensitize specific data in text

input: The down AWS EC2 instance id is ec2-01845dct67i, please page the on-call engineer.
output: The down AWS EC2 instance id is ec2-XXXX is in our cloud. Please page the on-call engineer.

Conclusion

This demo demonstrated the capabilities of the Meta-Llama 3.1-8B-Instruct model in handling text processing tasks, like in the area of data desensitization. The process involved setting up a development environment, accessing the model, and implementing a Python script to generate the desired output. The success of this project paves the way for further exploration and integration of models in various applications.

Source code can be found at https://github.com/GuilinDev/AIPOC/tree/main/Desensitization_llama3.1_8b.

A Local Task with Hugging Face's Meta-Llama 3.1-8B-Instruct Model to Mask Sensitive Data

Guilin Zhang

Senior Software Engineer at Workday | Active Security Clearance

Introduction

Steps

1. Requesting Access to the Model

2. Creating an Access Token

3. Setting Up the Python Environment

4. Install the necessary libs

5. Logging into Hugging Face

领英推荐

6. Defining the Prompt

7. Loading the Model and Tokenizer

8. Generating the Output

9. Extracting and Decoding the Response

Conclusion

更多精彩文章

社区洞察

其他会员也浏览了

DATA Pill #054 - 10 best open-source repos, LLM, Flink and Apache Iceberg + Python

Exploring the Pillars of Tech: An In-depth Guide to Modern Technologies

Exploring the Pillars of Tech: An In-depth Guide to Modern Technologies

A Journey for Semantics Search with Elastic Search (80M) vectors Search (1.4TB)

Text Parsing in Python with US-Patent Data

A Brief History of AI

Innovate ML Model store with MongoDB as a Service

DATA Pill #092 - MLFlow iceberg, Meta ?? Python

Developing an AI bot powered by RAG and Oracle Database

Evaluating Snowflake for Generative AI Solutions: A Journey from Novice to Practitioner

Introduction

Steps

1. Requesting Access to the Model

2. Creating an Access Token

3. Setting Up the Python Environment

4. Install the necessary libs

5. Logging into Hugging Face

领英推荐

6. Defining the Prompt

7. Loading the Model and Tokenizer

8. Generating the Output

9. Extracting and Decoding the Response

Conclusion

Building and Deploying an AI Chatbot App with Streamlit in 19 Minutes

2024年8月5日

Building a Robust GitOps Pipeline: Docker, GitHub Actions, Kubernetes, and ArgoCD

2024年7月3日

社区洞察

其他会员也浏览了

DATA Pill #054 - 10 best open-source repos, LLM, Flink and Apache Iceberg + Python

Exploring the Pillars of Tech: An In-depth Guide to Modern Technologies

Exploring the Pillars of Tech: An In-depth Guide to Modern Technologies

A Journey for Semantics Search with Elastic Search (80M) vectors Search (1.4TB)

Text Parsing in Python with US-Patent Data

A Brief History of AI

Innovate ML Model store with MongoDB as a Service

DATA Pill #092 - MLFlow iceberg, Meta ?? Python

Developing an AI bot powered by RAG and Oracle Database

Evaluating Snowflake for Generative AI Solutions: A Journey from Novice to Practitioner