登录查看更多内容

Leveraging JSON Mode for Enhanced LLM Output

Ariya Hidayat

LLMs: Amplifying human potential

发布日期: 2024年1月24日

For some time, llama.cpp has allowed users to constrain its output, with JSON as a supported format. Simultaneously, OpenAI has introduced JSON Mode for its chat completion API, providing users with a versatile output option.

Implementing JSON formatting is straightforward. For llama.cpp, utilize the json.gbnf file containing JSON grammar during the API call. Here is an example in JavaScript.

    const method = 'POST';
    const headers = {
        'Content-Type': 'application/json'
    };
    const grammar = fs.readFileSync('json.gbnf', 'utf-8');
    const body = JSON.stringify({ prompt, grammar });
    const request = { method, headers, body };
    const response = await fetch(LLAMA_API_URL, request);
    const data = await response.json();

Similarly, for OpenAI, specify the response_format as { type: 'json_object' } when calling the API directly, such as:

    const response_format = { type: 'json_object' };
    const response = await fetch(url, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
            'Authorization': `Bearer ${OPENAI_API_KEY}`
        },
        body: JSON.stringify({ messages, model, response_format })
    });
    const data = await response.json();

It's essential to note that, as of this writing, only certain OpenAI models, gpt-4-1106-preview and gpt-3.5-turbo-1106, support JSON output. Additionally, for OpenAI, the system prompt in the messages array must explicitly mention the JSON instruction to avoid call failures.

For a zero-shot prompt, here's an example of a system message:

const SYSTEM_PROMPT = `You are a helpful assistant.
You answer the question from the user, politely and concisely in 13 words or less.

Always output a valid JSON as follows:
{
    "answer": a string with the concise answer
}`;

Since the response is now in JSON, parse it first and destructure it to get the answer out of the parsed result.

Chain of Thought

The previous example seems to underutilize JSON's capabilities, considering the product object has only one key named answer.

Though true, this is a foundational setup. As the prompt and output become more complex, JSON offers an efficient solution. Consider the Chain of Thought pattern, guiding the LLM to break down problems systematically. The prompt might look like this:

const SYSTEM_PROMPT = `You are a research assistant with access to Google search.
Given the conversation history and the inquiry from the user, your task is to use Google to search for the answer.

Think step by step. Always output a valid JSON as follows:
{
    "thought": describe your thoughts about the inquiry,
    "tool": the search engine to use (must be Google),
    "input": a string with the important key phrases to search for,
    "observation": a string with the concise result of the search
}

For instance, the internal thought process for the above chat illustration involves:

{
    "thought": "This is a question about population, I will use Google",
    "tool": "Google",
    "input": "population of Jakarta",
    "observation": "The population of Jakarta is approximately 10.6 million people."
}

Once the output is obtained, destructure it and extract the observation part, the actual answer. The other fields remain useful if properly logged, especially in cases of failures or user-provided negative feedback.

The same technique applies to Reason-Act, useful for invoking tools or functions, or assisting with the subsequent stage of RAG (Retrieval-Augmented Generation). If this is your first encounter with Reason-Act, refer to my previous article on LLM-based Chatbot Demo, where the complexity of string parsing is eliminated by using JSON format.

Question Answering

Discussing RAG, once relevant passages are retrieved, the LLM's task is to utilize them as a reference document to answer questions. To witness this in action, refer to my previous article on Semantic Search for RAG (and inspect the relevant code if necessary).

A significant advantage of the JSON output format is the flexibility to tweak prompts during development or troubleshooting, enabling the dumping of additional information.

Consider a scenario where the question is "When was the solar system formed?" Using vector similarity, the retrieval process fetches three relevant paragraphs from the document archive:

领英推荐

? GitHub Universe 2024 marks the start of a new AI era…

GitHub 4 个月前

Dash Club 10: Dash Enterprise 5.1, Dash-ChatGPT App…

Plotly 1 年前

OpenAI Assistants: How to create and use them

Pluralsight 1 年前

The Solar System developed 4.6 billion years ago when a dense region of a molecular cloud collapsed, forming the Sun and a protoplanetary disc. All four terrestrial planets belong to the inner Solar System and have solid surfaces. Inversely, all four giant planets belong to the outer Solar System and do not have a definite surface, as they are mainly composed of gases and liquids.

The closest star to the Solar System, Proxima Centauri, is 4.25 ly away. The Solar System orbits the Galactic Center of the Milky Way galaxy, as part of its Orion Spur, at a distance of 26,000 ly.?

The Solar System formed 4.568 billion years ago from the gravitational collapse of a region within a large molecular cloud. This initial cloud was likely several light-years across and probably birthed several stars. As is typical of molecular clouds, this one consisted mostly of hydrogen, with some helium, and small amounts of heavier elements fused by previous generations of stars.

Provided to the LLM with the right prompt, it should answer the question. A common prompt may look like this:

const SYSTEM_PROMPT = `You are an expert in retrieving information.
You are given a question from human and you have to answer it concisely in 23 words or less.

Use only the following reference document to evaluate the question and provide the answer.
Avoid using any external information or recalling from memory.

Reference Document:

{{all the relevant passages}}`

But, wait! With the JSON mode idea, tweak the prompt to include this extra instruction:

Always output a valid JSON formatted as follows:

{
  "answer": a string representing the concise answer,
  "citation": a string referring to sentence for the source of the answer
 }

Now, instead of only the answer, the LLM will also produce the citation:

{
    "answer": "4.568 billion years ago",
    "citation": "The Solar System formed 4.568 billion years ago from the gravitational collapse of a region within a large molecular cloud."
}

Once again, destructure the answer to present it to the user. Meanwhile, the citation can be used for metrics evaluation or surfaced to the user if they wish to find out the citation itself, or perhaps both!

How about Streaming Response?

When it comes to streaming responses, almost all prominent chat interfaces, from ChatGPT to Bard to Copilot, offer responses "on the fly," with each word appearing progressively, creating a responsive and engaging user experience.

However, with JSON mode at the output, this seamless streaming won't work out of the box. Partial JSON is not a valid JSON structure. If fed into a JSON parser, this incomplete form will rightfully explode.

The most straightforward workaround is acknowledging partial JSON as invalid and attempting manual completion. For instance, if initial parsing fails, another attempt can be made after appending a double quote and a closing bracket to the response. This should work well for the schema outline in the previous examples, except within the rare boundary of key-value, as the object itself isn't deep. If parsing still fails, wait until more chunks are streamed by the LLM. Rinse and repeat!

Debugging Hitchhike

Object destructuring's advantage is that everything else can be ignored by the callee. In the previous case of Chain of Thought, where only the observation field is crucial, additional key-values can be tucked into the object, ensuring the object shape remains the same for compatibility.

Illustratively, an extra value denoting processing time is tucked into the response. While mostly ignored by subsequent stages, it proves immensely valuable for troubleshooting.

const respond = async (history) => {
   const start = Date.now();
   const init = { role: 'system', content: SYSTEM_PROMPT };
   const messages = [].concat(init).concat(history);
   const response = JSON.parse(await llm(messages));
   const time = Date.now() - start;
   return { ...response, time };
}

I hope this short article convinces you to work with JSON format extensively when using LLM. Feedback is warmly welcomed, and I look forward to hearing about your LLM adventures!

要查看或添加评论，请登录

Ariya Hidayat的更多文章

The Anti-Framework Guide for Building LLM Apps

2024年10月14日

The Anti-Framework Guide for Building LLM Apps

In the rapid explosion of LLM technology over the past two years, one thing is clear: we're all still learning. Many…

1 条评论
Phi 2 for RAG and the Emergence of Small Language Model (SLM)

2023年12月29日

Phi 2 for RAG and the Emergence of Small Language Model (SLM)

Phi 2, developed by Microsoft, is making waves in the world of LLM. Unlike its larger counterparts, Phi 2 is…

3 条评论
Pico Jarvis: An LLM-based Chatbot Demo with RAG (Part 3)

2023年12月26日

Pico Jarvis: An LLM-based Chatbot Demo with RAG (Part 3)

The previous two articles (Part 1 and Part 2) laid the foundation for an LLM-based chatbot with two core capabilities:…

3 条评论
Pico Jarvis: An LLM-based Chatbot Demo with RAG (Part 2)

2023年12月13日

Pico Jarvis: An LLM-based Chatbot Demo with RAG (Part 2)

In the previous Part 1, we explored the potential of Chain of Thought (CoT) prompts, resulting in a chatbot capable of…

1 条评论
Pico Jarvis: An LLM-based Chatbot Demo with RAG (Part 1)

2023年12月8日

Pico Jarvis: An LLM-based Chatbot Demo with RAG (Part 1)

A few weeks ago, I gave a talk at an LLM meetup hosted by Gather in Palo Alto, demonstrating how to construct an…

4 条评论
The Pyramid of Articulate Communication

2020年8月6日

The Pyramid of Articulate Communication

Being able to communicate with clarity and conciseness is a critical skill to the success of every engineering manager,…
Tracking Management Tasks on Kanban Boards

2020年7月13日

Tracking Management Tasks on Kanban Boards

The use of Kanban boards is fairly popular in the context of a sprint-style software development, as a form of a…

1 条评论
Engineering Management as a Coaching Responsibility

2020年6月13日

Engineering Management as a Coaching Responsibility

What is the purpose of being an engineering manager? After studying it for a while, I realize that there is a strong…

3 条评论
The Web Browser as the Ultimate Password Manager

2018年3月1日

The Web Browser as the Ultimate Password Manager

We are notoriously bad at password hygiene. Yet, it is crucial for our digital lives.

14 条评论
Cloud vs Cloud

2018年1月30日

Cloud vs Cloud

Because the particular nebulous definition of cloud computing, an organization which does not carefully perform a…

4 条评论

See all articles

Leveraging JSON Mode for Enhanced LLM Output

Ariya Hidayat

LLMs: Amplifying human potential

Chain of Thought

Question Answering

领英推荐

How about Streaming Response?

Debugging Hitchhike

Ariya Hidayat的更多文章

社区洞察

其他会员也浏览了

Node.js: Powering Scalable AI Solutions with JavaScript's Versatility

Create your own GPT, Astro gains resumability, and the right way build an AI project.

State of AI & Web Scraping in 2024: Thoughts and Predictions

Developers’ Tutorial: Using Claude’s Tool (Function Calling) with Brave Web Search API

Llama 2, ChatGPT for Web Scraping, & Latest Python News

DataPanthy #92

A search engine that helps you code...

Our Picks of High-Performance LLM Tech Stacks

Navigating the Future: Full Stack Development in the AI Era

Unleashing the Power of ChatGPT in Web Crawling & Automation with Python: A Comprehensive Guide

Chain of Thought

Question Answering

领英推荐

How about Streaming Response?

Debugging Hitchhike

Ariya Hidayat的更多文章

The Anti-Framework Guide for Building LLM Apps

Phi 2 for RAG and the Emergence of Small Language Model (SLM)

Pico Jarvis: An LLM-based Chatbot Demo with RAG (Part 3)

Pico Jarvis: An LLM-based Chatbot Demo with RAG (Part 2)

Pico Jarvis: An LLM-based Chatbot Demo with RAG (Part 1)

The Pyramid of Articulate Communication

Tracking Management Tasks on Kanban Boards

Engineering Management as a Coaching Responsibility

The Web Browser as the Ultimate Password Manager

Cloud vs Cloud

社区洞察

其他会员也浏览了

Node.js: Powering Scalable AI Solutions with JavaScript's Versatility

Create your own GPT, Astro gains resumability, and the right way build an AI project.

State of AI & Web Scraping in 2024: Thoughts and Predictions

Developers’ Tutorial: Using Claude’s Tool (Function Calling) with Brave Web Search API

Llama 2, ChatGPT for Web Scraping, & Latest Python News

DataPanthy #92

A search engine that helps you code...

Our Picks of High-Performance LLM Tech Stacks

Navigating the Future: Full Stack Development in the AI Era

Unleashing the Power of ChatGPT in Web Crawling & Automation with Python: A Comprehensive Guide