Introduction to LangChain
Jaya Plmanabhan
Revisto: Chief Data Officer & Co-Founder at Revisto | AI, Machine Learning, Data Science
Introduction:
LangChain is a framework that allows developers to create applications powered by language models. It enables data-aware and agentic applications by connecting language models to other data sources and allowing them to interact with their environment. LangChain’s main value propositions are its components, which are abstractions for working with language models, and its off-the-shelf chains, which are structured assemblies of components for accomplishing specific tasks. These features make it easy to get started with LangChain, while also allowing for customization and the development of more complex applications.
Installation:
LangChain can be installed using pip or conda. By running the command pip install langchain, this will install the minimum requirements for LangChain, but not the dependencies needed for integrating with model providers, datastores, etc. To install the modules needed for common LLM providers, you can run pip install langchain[llms], and to install all modules needed for all integrations, you can run pip install langchain[all]. Note that if you are using zsh, you’ll need to quote square brackets when passing them as an argument to a command, for example:
pip install 'langchain[all]'.
Environment setup:
To use LangChain with Hugging Face’s BART model, you’ll need to install the Transformers library by running pip install transformers. Unlike OpenAI’s models, BART does not require an API key to use. Instead, you can directly load a pre-trained BART model from Hugging Face’s model hub or fine-tune it on your own data. You can initiate the BART class from the appropriate module in the Transformers library and use it with LangChain. For example:
from transformers import BartForConditionalGeneration; llm = BartForConditionalGeneration.from_pretrained('facebook/bart-large').
Building an application:
LangChain is a framework that helps you build applications powered by language models. It has many modules that can be used alone or combined for more complex applications. The main building block of LangChain applications is the LLMChain, which combines three things: the language model, prompt templates, and output parsers. The language model is the core reasoning engine, prompt templates provide instructions to the language model, and output parsers translate the raw response into a more usable format. Understanding these concepts will help you use and customize LangChain applications. Most LangChain applications allow you to configure the language model and prompts used, so knowing how to do this will be very helpful.
LLMs:
In LangChain, there are two types of language models: LLMs and ChatModels. LLMs take a string as input and return a string, while ChatModels take a list of messages as input and return a single message. The input for ChatModels is a list of ChatMessages, which have two required components: the content of the message and the role of the entity sending the message. LangChain provides several objects to distinguish between different roles, including HumanMessage, AIMessage, SystemMessage, and FunctionMessage. You can also specify the role manually using the ChatMessage class. LangChain exposes a standard interface for both types of language models with two methods: predict, which takes in a string and returns a string, and predict_messages, which takes in a list of messages and returns a single message.
Let's see how to work with these different types of models and these different types of inputs. First, let's import an LLM and a ChatModel.
from langchain.llms import HuggingFace
from langchain.chat_models import ChatHuggingFace
llm = HuggingFace(model_name='facebook/bart-large')
chat_model = ChatHuggingFace(model_name='facebook/bart-large')
llm.predict("hi!")
>>> "Hi"
chat_model.predict("hi!")
>>> "Hi"
In this example, we import the HuggingFace and ChatHuggingFace classes from the langchain.llms and langchain.chat_models modules, respectively. We then initiate instances of these classes, specifying the name of the BART model we want to use ('facebook/bart-large'). Finally, we use the predict method of both the llm and chat_model instances to generate responses to the input "hi!". Both models return the response "Hi"
Next, let's use the predict method to run over a string input.
The HuggingFace(model_name='facebook/bart-large') and ChatHuggingFace(model_name='facebook/bart-large') objects are essentially configuration objects. You can initialize them with parameters such as temperature and others, and pass them around. These objects allow you to use the BART model from Hugging Face with LangChain.
from langchain.llms import HuggingFace
from langchain.chat_models import ChatHuggingFace
llm = HuggingFace(model_name='facebook/bart-large')
chat_model = ChatHuggingFace(model_name='facebook/bart-large')
text = "What would be a good company name for a company that makes colorful socks?"
llm.predict(text)
# >> Feetful of Fun
chat_model.predict(text)
# >> Socks O'Color
In this example, we import the HuggingFace and ChatHuggingFace classes from the langchain.llms and langchain.chat_models modules, respectively. We then initiate instances of these classes, specifying the name of the BART model we want to use ('facebook/bart-large'). Finally, we use the predict method of both the llm and chat_model instances to generate responses to the input text. The llm model returns the response "Feetful of Fun", while the chat_model returns "Socks O'Color".?
Finally, let's use the predict_messages method to run over a list of messages.
from langchain.llms import HuggingFace
from langchain.chat_models import ChatHuggingFace
from langchain.schema import HumanMessage
llm = HuggingFace(model_name='facebook/bart-large')
chat_model = ChatHuggingFace(model_name='facebook/bart-large')
text = "What would be a good company name for a company that makes colorful socks?"
messages = [HumanMessage(content=text)]
llm.predict_messages(messages)
# >> Feetful of Fun
chat_model.predict_messages(messages)
# >> Socks O'Color
In this example, we import the HuggingFace, ChatHuggingFace, and HumanMessage classes from the langchain.llms, langchain.chat_models, and langchain.schema modules, respectively. We then initiate instances of the HuggingFace and ChatHuggingFace classes, specifying the name of the BART model we want to use ('facebook/bart-large'). We create a list of messages containing a single HumanMessage with the content "What would be a good company name for a company that makes colorful socks?". Finally, we use the predict_messages method of both the llm and chat_model instances to generate responses to the input messages. The llm model returns the response "Feetful of Fun", while the chat_model returns "Socks O'Color"
In LangChain, you can use the predict and predict_messages methods of the HuggingFace and ChatHuggingFace classes to generate responses using the BART model from Hugging Face. You can pass in additional parameters as keyword arguments to these methods to adjust their behavior. For example, you could pass in temperature=0 to adjust the temperature used by the model. The temperature parameter controls the randomness or creativity of the text generated by a language model. A lower temperature will result in more conservative and predictable predictions, while a higher temperature will encourage more diverse and creative outputs. Any values passed in at runtime will override the values configured when the object was initiated. This allows you to fine-tune the behavior of the model on a per-prediction basis.
Prompt templates:
In most LLM applications, user input is not passed directly to the language model. Instead, it is added to a larger piece of text called a prompt template, which provides additional context for the task at hand. For example, in the previous example, the text passed to the model contained instructions to generate a company name. With a prompt template, the user would only need to provide a description of the company or product, without having to worry about giving the model instructions. Prompt templates help with this by bundling up all the logic for going from user input to a fully formatted prompt.?
This can start off very simple - for example, a prompt to produce the above string would just be:
from langchain.prompts import PromptTemplate
prompt = PromptTemplate.from_template("What is a good name for a company that makes {product}?")
prompt.format(product="colorful socks")
In this example, we import the PromptTemplate class from the langchain.prompts module. We then create an instance of this class using the from_template method, passing in a string template that contains a placeholder for the product. Finally, we use the format method of the prompt instance to fill in the placeholder with the value "colorful socks". This produces a fully formatted prompt that can be passed to a language model for generating a response.
领英推荐
Using PromptTemplate in LangChain has several advantages over raw string formatting. For example, you can partially fill in variables, formatting only some of the variables at a time. You can also easily combine different templates into a single prompt. PromptTemplate can also be used to produce a list of messages, where the prompt contains information about the content, role, and position of each message in the list. A ChatPromptTemplate is often a list of ChatMessageTemplates, each containing instructions for formatting a ChatMessage, including its role and content.
from langchain.prompts.chat import (
????ChatPromptTemplate,
????SystemMessagePromptTemplate,
????HumanMessagePromptTemplate,
)
template = "You are a helpful assistant that translates {input_language} to {output_language}."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
chat_prompt.format_messages(input_language="English", output_language="French", text="I love programming.")
In this example, we import the ChatPromptTemplate, SystemMessagePromptTemplate, and HumanMessagePromptTemplate classes from the langchain.prompts.chat module. We then create instances of the SystemMessagePromptTemplate and HumanMessagePromptTemplate classes using the from_template method, passing in string templates that contain placeholders for the input language, output language, and text. We create a ChatPromptTemplate instance using the from_messages method, passing in a list containing the system_message_prompt and human_message_prompt instances. Finally, we use the format_messages method of the chat_prompt instance to fill in the placeholders with the values "English", "French", and "I love programming.". This produces a fully formatted chat prompt that can be passed to a language model for generating a response.
Output parsers:
OutputParsers in LangChain convert the raw output of a language model into a format that can be used downstream. There are several main types of OutputParsers, including those that convert text from a language model into structured information (such as JSON), those that convert a ChatMessage into just a string, and those that convert the extra information returned from a call (such as function invocation) into a string. For example, if you were using the BART or BartChat models from Hugging Face, you could use an OutputParser to convert the raw text output of the model into a more usable format for your application.?
We write our own output parser - one that converts a comma separated list into a list.
from langchain.schema import BaseOutputParser
class CommaSeparatedListOutputParser(BaseOutputParser):
????"""Parse the output of an LLM call to a comma-separated list."""
????def parse(self, text: str):
????????"""Parse the output of an LLM call."""
????????return text.strip().split(", ")
CommaSeparatedListOutputParser().parse("hi, bye")
# >> ['hi', 'bye']
In this example, we import the BaseOutputParser class from the langchain.schema module and create a custom CommaSeparatedListOutputParser class that inherits from it. We override the parse method of the BaseOutputParser class to implement our custom parsing logic, which splits the input text on commas and returns a list of strings. We then create an instance of our custom CommaSeparatedListOutputParser class and use its parse method to parse the input text "hi, bye" into a list of strings ['hi', 'bye'].?
LLMChain:
In LangChain, you can combine all the components we’ve discussed into one chain. This chain takes input variables, passes them to a prompt template to create a prompt, passes the prompt to a language model, and then passes the output through an optional output parser. This is a convenient way to bundle up a modular piece of logic and use it in your application. It allows you to easily generate responses from a language model using a structured and reusable process.
from langchain.chat_models import ChatHuggingFace
from langchain.prompts.chat import (
????ChatPromptTemplate,
????SystemMessagePromptTemplate,
????HumanMessagePromptTemplate,
)
from langchain.chains import LLMChain
from langchain.schema import BaseOutputParser
class CommaSeparatedListOutputParser(BaseOutputParser):
????"""Parse the output of an LLM call to a comma-separated list."""
????def parse(self, text: str):
????????"""Parse the output of an LLM call."""
????????return text.strip().split(", ")
template = """You are a helpful assistant who generates comma separated lists.
A user will pass in a category, and you should generate 5 objects in that category in a comma separated list.
ONLY return a comma separated list, and nothing more."""
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
chain = LLMChain(
????llm=ChatHuggingFace(model_name='facebook/bart-large'),
????prompt=chat_prompt,
????output_parser=CommaSeparatedListOutputParser()
)
chain.run("colors")
# >> ['red', 'blue', 'green', 'yellow', 'orange']
In this example, we import the ChatHuggingFace, ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate, LLMChain, and BaseOutputParser classes from the langchain.chat_models, langchain.prompts.chat, langchain.chains, and langchain.schema modules, respectively. We create a custom CommaSeparatedListOutputParser class that inherits from the BaseOutputParser class and overrides its parse method to implement our custom parsing logic. We create instances of the SystemMessagePromptTemplate and HumanMessagePromptTemplate classes using the from_template method, passing in string templates that contain placeholders for the text. We create a ChatPromptTemplate instance using the from_messages method, passing in a list containing the system_message_prompt and human_message_prompt instances. We create an instance of the LLMChain class, passing in instances of the ChatHuggingFace, chat_prompt, and CommaSeparatedListOutputParser classes as arguments. Finally, we use the run method of the chain instance to generate a response to the input "colors". The chain returns a list of strings representing colors: ['red', 'blue', 'green', 'yellow', 'orange'].
Conclusion:
In summary, LangChain provides a flexible and modular framework for building applications with language models. Its key components of LLMs, prompt templates, and output parsers can be assembled into chains that enable reusable workflows. LangChain simplifies working with powerful models like BART through abstractions for LLMs and ChatModels. Prompt templates allow instructing models without writing all the prompt text yourself. Output parsers extract usable information from raw model responses. Together, these components and patterns make it easy to integrate language models into data-aware, conversational applications. With off-the-shelf chains and customizability, LangChain enables quickly spinning up prototypes or developing complex solutions. Its structured approach helps focus on the functionality to build rather than language model intricacies. Overall, LangChain reduces the complexity of leveraging language models, enabling developers to create agentic applications backed by the capabilities of models like BART.
Co-founder, COO Pigro - Power up your workspace with Pigro website: pigro.ai
1 年Building a solution that works for every application is hard. We recently released a solution to split documents into optimal chunks of text. We split PDF and Office files based on the original document structure and content semantics.