DS Fortune Cookies: System Prompts

DS Fortune Cookies: System Prompts

"Lucky numbers: 0, 1. Lucky words: Your system prompt."

One thing to understand about language models is that they work on plain text. I found this confusing when doing fine-tuning because most APIs now use a chat completion template. But under the hood, every language model's tokenizer converts all these messages and roles (e.g. system, user, assistant) into plain text with the tokenizer. The model then generates next words based on the tokenizer vector. So once again, language models are just next token prediction on plain text - we just dress them up using tools, chats, etc.

Here is how you would do this conversion with the Hugging Face transformers library.

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")

chat = [
  {"role": "user", "content": "Hello, how are you?"},
  {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
  {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

tokenizer.apply_chat_template(chat, tokenize=False)

>>>

"<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"        

Every model has different special tokens (e.g. [INST]) and system prompts (e.g. <<SYS>>). This fortune cookie reviews these prompts for three popular model classes and how they can be used.

GPT (OpenAI)

OpenAI uses the chat completion interface extensively and has a ‘system’ role for injecting system prompts. The rest of the interface uses user and assistant. GPT is proprietary and to my knowledge hasn’t revealed it’s special tokens (correct me!) - but some digging shows it uses < |bos>, < |eos>, < |unk>, < |pad>, < |sep>, < |cls>, < |mask>. It is more important to know that you can prompt GPT models using the system role in the chat interface.

{"messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, who are you?"},
    {"role": "assistant", "content": "Hello! I'm an AI assistant created by OpenAI. How can I help you today?"}
]}        

Llama (Meta)

Llama models are trained on four roles - system, user, assistant, and ipython (as of 3.1). It uses the? <|start_header_id|>ROLE<|end_header_id|> special tokens to inject this into the request. The tokenizer takes care of this with the chat template, e.g.

chat = {"messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"},
]}

tokenizer.apply_chat_template(chat, tokenize=False)

>>>

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>        

You can reveal the special tokens for any open source model using this snippet with the transformers library, or by looking at the special_tokens_map.json in the files for each model on Hugging Face.

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
special_tokens = tokenizer.all_special_tokens        

Claude (Anthropic)

Anthropic uses a similar chat completion interface as OpenAI and the system prompt can be added via the system role. While Anthropic models are also proprietary, they are more open about special tokens and prompts. It responds well to XML tags like <function><\function>, etc. It may use <claude_info> as a system prompt, but I haven’t been able to find anything definitive here. Anthropic also publishes a nice prompt library.

Akshay G.

Cloud Engineer | Azure | Google Cloud | Infrastructure & Automation Expert

3 个月

Whats your prefered LLM ? Open AI or Claude AI?

回复

要查看或添加评论,请登录

Scott McKean的更多文章

  • Databricks Logging and Debugging

    Databricks Logging and Debugging

    Let’s talk about logging on Databricks, specifically in Notebooks, Spark, and Ray. Effective logging is critical for…

    4 条评论
  • DS Fortune Cookies: FTI Architecture

    DS Fortune Cookies: FTI Architecture

    Three sisters dancing in endless flow, feature, train, and infer they go! I read the LLM Engineer's Handbook over the…

  • Azure Databricks CI/CD

    Azure Databricks CI/CD

    This is an opinionated article on continuous integration and continuous delivery (CI/CD). These are specific practices…

    5 条评论
  • DS Fortune Cookies: LangChain, Agents, and Authentication

    DS Fortune Cookies: LangChain, Agents, and Authentication

    “Embrace LangChain's evolution and your spirit will be unbreakable, unlike your code.” This fortune cookie clarifies…

    2 条评论
  • An Opinionated Primer on Fine-Tuning

    An Opinionated Primer on Fine-Tuning

    Databricks Week 18 I'll admit that when I first heard about 'small language models', I thought it was a ridiculous fad.…

    4 条评论
  • Text Similarity

    Text Similarity

    Databricks Week 16 This week I had the pleasure of speaking with a couple of customers that want to compare two bits of…

    1 条评论
  • 100 Days at Databricks

    100 Days at Databricks

    As I hit the 100-day mark at Databricks, I want to review the journey so far with some of the bigger themes that stood…

    6 条评论
  • Anomaly Detection

    Anomaly Detection

    Databricks Week 12/13 I was asked to help a customer out with anomaly detection. I brushed off some of the thoughts I…

    4 条评论
  • Forecasting Deep Dive

    Forecasting Deep Dive

    Databricks Week 10/11 Today is the day - I’m going to really let myself talk nerd. Let’s dive into time series…

    2 条评论
  • DS Fortune Cookies: Liquid AI

    DS Fortune Cookies: Liquid AI

    "When time is of the essence, closed-form solutions make all the difference." Liquid AI introduced a novel class of…

    1 条评论

社区洞察

其他会员也浏览了