登录查看更多内容

Structured Outputs from LLMs: LangChain Output Parsers

Vijay Chaudhary

Lead Software Engineer

发布日期: 2025年2月9日

LLMs are good at generating human-like text (hence called Generative AI), but when it comes to integrating to real-world applications, freeform natural text isn’t useful. Enterprise applications require structured responses - like?JSON, lists, key-value pairs, or well-defined objects. This is where output parsers?enable developers to define the expected data structure of an LLM's response,?outputs conform to popular formats like JSON and others. This structured approach simplifies downstream tasks like data integration, processing, or visualization.

Let’s understand with an example - instead of manually parsing unstructured text, a parser can enforce a response format where response is paired with a specific time and activity. ?

[Prompt] ?- Can you create a one-day travel itinerary for Paris??

[LLM’s Unstructured Response]?

Here’s a suggested itinerary: Morning: Visit the Eiffel Tower. ?

Afternoon: Walk through the Louvre Museum. ?

Evening: Enjoy dinner at a Seine riverside café.?

[JSON enforced output]?

[ 

  {"time": "Morning", "activity": "Visit the Eiffel Tower"}, 

  {"time": "Afternoon", "activity": "Walk through the Louvre Museum"}, 

  {"time": "Evening", "activity": "Enjoy dinner at a Seine riverside café"} 

]

LangChain framework provides a flexible set of output parsers. Howerver LangChain isn’t the only way to structure outputs, many LLMs offer native built-in support for structured responses. OpenAI’s Function Calling, Anthropic Claude’s JSON Mode, and Google Gemini Pro’s structured output capabilities allow direct JSON or object-based responses without requiring an external parser. Beyond LangChain and native LLM features, other AI frameworks also provide structured parsing mechanisms. LlamaIndex, Haystack, and Microsoft’s Semantic Kernel enable developers to extract structured data from LLMs, offering alternatives for different use cases. ?

In this article, we will mainly focus on LangChain’s output parsers and go through the working steps. Let’s take a look at currently available output parsers in Langchain parser, there are few more and this list is updated frequently.

First let’s set up the LLM for generating response. For the purposes of this article, we will use Gemini Pro model. ?

[1] Get Gemini API credentials from GCP console. ?

[2] Initialize a Gemini 1.5 Pro model instance using ChatGoogleGenerativeAI. Temperature is set to 0.0 to get?deterministic responses. ?Use llm_pro to invoke model chat response.

[Pydantic Parser]?

The PydanticOutputParser uses Python's Pydantic library to define and validate structured outputs. Useful when you need the model's responses in a well-defined JSON format. By creating a custom Pydantic schema, you can specify the structure, data types, and constraints. Let’s see an example, ?

[1] Add required imports to use PydanticOutputParser

[2] Define a Pydantic model

[3] Initialize the output Parser

领英推荐

LLM Evaluation, AI Side Projects, User-Friendly Data…

Towards Data Science 4 个月前

The Rise of Open-Source LLMs in Enterprises

Data Science Dojo 1 年前

Unlocking the Trifecta of AI Value: Productivity…

Helen Yu 11 个月前

[4] Define the prompt template with format instructions

[5] Create/run the chain and pass the input movie name

[6] Check the response from parser?

setup='Why did Ranbir Kapoor bring a ladder to the Animal movie premiere?' ?

punchline='He heard it was going to be wild and wanted a higher vantage point!'?

Code demonstrates how PydanticOutputParser can enforce structured outputs. A Joke schema is defined using Pydantic's BaseModel with fields - setup and punchline.?A PromptTemplate is then defined to ask the LLM to "tell a joke about a Bollywood movie" with format instructions extracted from the schema. Using a chain that combines the prompt, LLM, and parser, the topic "Animal Movie" is provided, and the LLM generates a joke with the setup and the punchline. Response is schema compliant for applications requiring consistent outputs.?

[CSV Parser]?

The CSVOutputParser is designed to parse language model outputs into well-structured CSV formats. It is particularly useful when you need tabular data with consistent rows and columns.?

[1] Do imports, initialize output parsers and define prompt template?

[2] Create and run the chain with - {"topic": "Bollywood Movies"}. And analyze the response. ?

['Sholay', 'Dilwale Dulhania Le Jayenge', '3 Idiots', 'Lagaan', 'Kabhi Khushi Kabhie Gham']?

This demonstrates ?CommaSeparatedListOutputParser to output a list of items in comma-separated list format. The parser is initialized and integrated into a chain along with a prompt template that instructs the model to "List five {topic}" with format instructions automatically included.?This ensures the output is?directly usable in applications requiring a simple list format

[Date Time Parser]?

The DatetimeOutputParser ensures that the language model's responses are parsed and validated as well-structured datetime objects. It is particularly useful when you need outputs in consistent and machine-readable datetime formats.?

[1] Parser and input prompt template

[2] Response ?

2017-01-20 17:15:00?

DatetimeOutputParser parses the model's response into a structured datetime object. Parser is initialized and integrated into a chain with a prompt template that asks for the "date and time of Trump’s first Presidential speech," including format instructions automatically added by the parser. Output is returned as a valid and machine-readable datetime object, making it usable for time-sensitive applications.?

Summary?

We focused on how LangChain's output parsers help convert unstructured responses from language models into well-defined formats. Explored few practical examples using the PydanticOutputParser, CommaSeparatedListOutputParser, and DatetimeOutputParser, showing how these tools make outputs more predictable and easier to integrate into applications.??

While we covered some key parsers, there are more?in library worth exploring. E.g. RegexDictParser is good for extracting data using patterns, OutputFixingParser is useful for handling incomplete or inconsistent outputs, and YamlOutputParser works with YAML data formats. Also consider experimenting with the parsers not covered here. For advanced use cases, you might investigate chaining parsers together or integrating LangChain parsers with native LLM features like OpenAI's function calling. These tools can help us and make working with LLMs easier and effective.?

AI-ML & Automations

1,576 位关注者

要查看或添加评论，请登录

Vijay Chaudhary的更多文章

Understanding RAG Evaluation: A Practical Approach to Retrieval Metrics

2025年3月16日

Understanding RAG Evaluation: A Practical Approach to Retrieval Metrics

Retrieval-Augmented Generation (RAG) systems are gaining popularity, helping users find relevant documents to answer…

1 条评论
Splitting Text Right Way - NLTK, SpaCy or Markdown

2025年3月2日

Splitting Text Right Way - NLTK, SpaCy or Markdown

For natural language processing (NLP) working with large pieces of text can be challenging. Many language models have…

1 条评论
Unlocking Entities and Relations: Creating Knowledge Graphs with AI

2025年2月16日

Unlocking Entities and Relations: Creating Knowledge Graphs with AI

GraphRAG is something which is picking up recently, in this article we will try to get to the basics of GraphRag…
Handling Sensitive Data: Redaction, Masking and Compliance

2025年2月2日

Handling Sensitive Data: Redaction, Masking and Compliance

In today's data-driven world, digital documents containing sensitive information pose challenges to privacy and…
Optimizing AI Workflows with LangChain - A Practical Introduction

2025年1月25日

Optimizing AI Workflows with LangChain - A Practical Introduction

LangChain is a framework for developing applications powered by large language models (LLMs). It helps in simplifying…
Effortlessly Organize Mixed Documents with GCP's Custom Splitter Feature

2025年1月19日

Effortlessly Organize Mixed Documents with GCP's Custom Splitter Feature

In real-world scenarios, it's common to encounter multiple documents combined into a single, multi-page image or PDF…
Improving AI Contextual Understanding -Retrieval Augmented Generation (RAG)

2025年1月4日

Improving AI Contextual Understanding -Retrieval Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a technique in natural language processing that uses knowledgebase information…

2 条评论
Understanding Custom Classifiers in Google Document AI

2024年12月29日

Understanding Custom Classifiers in Google Document AI

There are three categories of models or services in GCP Document AI – General Document processors (Layout, Form and Doc…
Processing with GCP Document AI: Exploring Pretrained Parsers

2024年12月15日

Processing with GCP Document AI: Exploring Pretrained Parsers

GCP Document AI offers multiple products to process documents for information for different use cases. Below…

2 条评论
Custom Document Extractors with Google Document AI

2024年12月8日

Custom Document Extractors with Google Document AI

GCP Document AI broadly has three categories of document extraction models – General Document processors (Layout, Form…

See all articles

Structured Outputs from LLMs: LangChain Output Parsers

Vijay Chaudhary

Lead Software Engineer

领英推荐

AI-ML & Automations

1,576 位关注者

Vijay Chaudhary的更多文章

社区洞察

其他会员也浏览了

Build RAG applications using only APIs with Postman! ??

A Guide to Building RAG

Spotlight on Databricks RAG Tools, Vector Search, Feature & Function Serving

Creating Advanced Data-Driven GPTs Without APIs: Using Decomposed URLs & Algorithmic Analysis

Innovative Retrieval-Augmented Generation (RAG) Solutions in 2024: Classification, Frameworks, and Practical Combinations

How your enterprise should use a vector database for its LLM apps - AI&YOU #54

The Data Prep Kit and Open Source RAG

Building an Advanced AI Workflow with Azure Search and Custom Data Integration

Is Your Data Strategy Ready for Generative AI?

OpenAI Introduces Structured Outputs - A Breakthrough for Developers

领英推荐

AI-ML & Automations

1,576 位关注者

Vijay Chaudhary的更多文章

Understanding RAG Evaluation: A Practical Approach to Retrieval Metrics

Splitting Text Right Way - NLTK, SpaCy or Markdown

Unlocking Entities and Relations: Creating Knowledge Graphs with AI

Handling Sensitive Data: Redaction, Masking and Compliance

Optimizing AI Workflows with LangChain - A Practical Introduction

Effortlessly Organize Mixed Documents with GCP's Custom Splitter Feature

Improving AI Contextual Understanding -Retrieval Augmented Generation (RAG)

Understanding Custom Classifiers in Google Document AI

Processing with GCP Document AI: Exploring Pretrained Parsers

Custom Document Extractors with Google Document AI

社区洞察

其他会员也浏览了

Build RAG applications using only APIs with Postman! ??

A Guide to Building RAG

Spotlight on Databricks RAG Tools, Vector Search, Feature & Function Serving

Creating Advanced Data-Driven GPTs Without APIs: Using Decomposed URLs & Algorithmic Analysis

Innovative Retrieval-Augmented Generation (RAG) Solutions in 2024: Classification, Frameworks, and Practical Combinations

How your enterprise should use a vector database for its LLM apps - AI&YOU #54

The Data Prep Kit and Open Source RAG

Building an Advanced AI Workflow with Azure Search and Custom Data Integration

Is Your Data Strategy Ready for Generative AI?

OpenAI Introduces Structured Outputs - A Breakthrough for Developers