Structured Outputs from LLMs: LangChain Output Parsers

Structured Outputs from LLMs: LangChain Output Parsers

LLMs are good at generating human-like text (hence called Generative AI), but when it comes to integrating to real-world applications, freeform natural text isn’t useful. Enterprise applications require structured responses - like?JSON, lists, key-value pairs, or well-defined objects. This is where output parsers?enable developers to define the expected data structure of an LLM's response,?outputs conform to popular formats like JSON and others. This structured approach simplifies downstream tasks like data integration, processing, or visualization.

Let’s understand with an example - instead of manually parsing unstructured text, a parser can enforce a response format where response is paired with a specific time and activity. ?

[Prompt] ?- Can you create a one-day travel itinerary for Paris??

[LLM’s Unstructured Response]?

Here’s a suggested itinerary: Morning: Visit the Eiffel Tower. ?

Afternoon: Walk through the Louvre Museum. ?

Evening: Enjoy dinner at a Seine riverside café.?

[JSON enforced output]?

[ 

  {"time": "Morning", "activity": "Visit the Eiffel Tower"}, 

  {"time": "Afternoon", "activity": "Walk through the Louvre Museum"}, 

  {"time": "Evening", "activity": "Enjoy dinner at a Seine riverside café"} 

]         

LangChain framework provides a flexible set of output parsers. Howerver LangChain isn’t the only way to structure outputs, many LLMs offer native built-in support for structured responses. OpenAI’s Function Calling, Anthropic Claude’s JSON Mode, and Google Gemini Pro’s structured output capabilities allow direct JSON or object-based responses without requiring an external parser. Beyond LangChain and native LLM features, other AI frameworks also provide structured parsing mechanisms. LlamaIndex, Haystack, and Microsoft’s Semantic Kernel enable developers to extract structured data from LLMs, offering alternatives for different use cases. ?

In this article, we will mainly focus on LangChain’s output parsers and go through the working steps. Let’s take a look at currently available output parsers in Langchain parser, there are few more and this list is updated frequently.

First let’s set up the LLM for generating response. For the purposes of this article, we will use Gemini Pro model. ?

[1] Get Gemini API credentials from GCP console. ?

[2] Initialize a Gemini 1.5 Pro model instance using ChatGoogleGenerativeAI. Temperature is set to 0.0 to get?deterministic responses. ?Use llm_pro to invoke model chat response.

[Pydantic Parser]?

The PydanticOutputParser uses Python's Pydantic library to define and validate structured outputs. Useful when you need the model's responses in a well-defined JSON format. By creating a custom Pydantic schema, you can specify the structure, data types, and constraints. Let’s see an example, ?

[1] Add required imports to use PydanticOutputParser

[2] Define a Pydantic model

[3] Initialize the output Parser

[4] Define the prompt template with format instructions

[5] Create/run the chain and pass the input movie name

[6] Check the response from parser?

setup='Why did Ranbir Kapoor bring a ladder to the Animal movie premiere?' ?

punchline='He heard it was going to be wild and wanted a higher vantage point!'?

Code demonstrates how PydanticOutputParser can enforce structured outputs. A Joke schema is defined using Pydantic's BaseModel with fields - setup and punchline.?A PromptTemplate is then defined to ask the LLM to "tell a joke about a Bollywood movie" with format instructions extracted from the schema. Using a chain that combines the prompt, LLM, and parser, the topic "Animal Movie" is provided, and the LLM generates a joke with the setup and the punchline. Response is schema compliant for applications requiring consistent outputs.?

[CSV Parser]?

The CSVOutputParser is designed to parse language model outputs into well-structured CSV formats. It is particularly useful when you need tabular data with consistent rows and columns.?

[1] Do imports, initialize output parsers and define prompt template?

[2] Create and run the chain with - {"topic": "Bollywood Movies"}. And analyze the response. ?

['Sholay', 'Dilwale Dulhania Le Jayenge', '3 Idiots', 'Lagaan', 'Kabhi Khushi Kabhie Gham']?

This demonstrates ?CommaSeparatedListOutputParser to output a list of items in comma-separated list format. The parser is initialized and integrated into a chain along with a prompt template that instructs the model to "List five {topic}" with format instructions automatically included.?This ensures the output is?directly usable in applications requiring a simple list format

[Date Time Parser]?

The DatetimeOutputParser ensures that the language model's responses are parsed and validated as well-structured datetime objects. It is particularly useful when you need outputs in consistent and machine-readable datetime formats.?

[1] Parser and input prompt template

[2] Response ?

2017-01-20 17:15:00?

DatetimeOutputParser parses the model's response into a structured datetime object. Parser is initialized and integrated into a chain with a prompt template that asks for the "date and time of Trump’s first Presidential speech," including format instructions automatically added by the parser. Output is returned as a valid and machine-readable datetime object, making it usable for time-sensitive applications.?

Summary?

We focused on how LangChain's output parsers help convert unstructured responses from language models into well-defined formats. Explored few practical examples using the PydanticOutputParser, CommaSeparatedListOutputParser, and DatetimeOutputParser, showing how these tools make outputs more predictable and easier to integrate into applications.??

While we covered some key parsers, there are more?in library worth exploring. E.g. RegexDictParser is good for extracting data using patterns, OutputFixingParser is useful for handling incomplete or inconsistent outputs, and YamlOutputParser works with YAML data formats. Also consider experimenting with the parsers not covered here. For advanced use cases, you might investigate chaining parsers together or integrating LangChain parsers with native LLM features like OpenAI's function calling. These tools can help us and make working with LLMs easier and effective.?

要查看或添加评论,请登录

Vijay Chaudhary的更多文章

社区洞察

其他会员也浏览了