DS Fortune Cookies: LangChain, Agents, and Authentication
“Embrace LangChain's evolution and your spirit will be unbreakable, unlike your code.”
This fortune cookie clarifies some things around LangChain, agents, and authentication. LangChain is a deployment framework for machine learning models. Databricks still leverages LangChain a lot and the ecosystem is mutating quickly. This post clarifies the Databricks LangChain packages and explains how authentication works.
Databricks & LangChain
LangChain is amazing - they have a small team and have revolutionized the use of language models. This evolution comes with a price though, and if you are using LangChain with Databricks, there are three key things to remember:
The langchain-community package is a generic package for integrating providers and components. It used to have a ChatDatabricks interface, along with other providers. But as you can imagine, trying to manage pull requests from hundreds of providers isn’t tenable, so the interfaces in langchain-community (e.g. from langchain_community.embeddings import DatabricksEmbeddings)are now deprecated along with the langchain-databricks package (from langchain_databricks import ChatDatabricks). The pattern to use moving forward is to leverage the databricks-langchain package. This package has several key components:
Compatibility manifests in several ways, for example to tell MLFlow what type of response you are expecting, or to allow tool-use within LangChain agents. It is also important in agent signatures.
领英推荐
***It is worth noting here that most LLMs use a completions or a chat interface, but the world has pretty much settled on a user, system, and assistant chat completion so it is worth building around that, even if just doing a simple query-response framework.
Authentication
Several customers were having issues authenticating to the vector store or tools. Let’s break this down. First - there are three ways to authenticate in Databricks - OAuth machine to machine (M2M), OAuth user to machine (U2M), and personal access tokens (PAT).
When talking about model serving, there are two ways to authenticate from a Databricks created serving endpoint to resources (e.g. a vector store): automatic passthrough and manual.?
Let’s start with manual authentication. It leverages secrets-based environment variables - this can be a PAT or M2M authentication. To use it, you pass a service principal (SP) and SP token into the objects making the calls (LangChain or the Databricks SDK). Both the SP identity and token should be stored as secrets and passed programmatically. This can be a pain, and Databricks recently shipped automatic authentication passthrough for dependent resources. This is beautiful and under the hood generated short lived M2M service principals and authentication that ‘just work’. I’d recommend trying these for all your model serving resource needs!
Other Links
Databricks Platform Security Recommendations (TLDR: Use OAuth Tokens)
Databricks RSA | Machine Learning, Big Data, Azure / AWS Expert
3 个月Very helpful! ??
Democratize data and AI, helping data teams solve the world's toughest problems
3 个月Always worth the wait, thank you for sharing!