登录查看更多内容

Databricks AI Playground | How to bring your own model

Olivier Soucy

Founder @ okube.ai | Fractional Data Platform Engineer | Open-source Developer | Databricks Partner

发布日期: 2024年8月26日

After a few months in public preview, Databricks AI Playground has garnered great feedback from the community. But if you’ve been living under a rock (or, shall we say, a brick) and have no idea what it’s all about, check out this short video by Holly Smith: https://www.youtube.com/shorts/pNA-YYLBJH4

In essence, this playground is a chat-like environment where you can test, prompt, and compare LLMs. And because a picture is worth a thousand words, here’s a little snapshot:

Comparing LLM with Databricks AI Playground

It looks like DBRX thinks I’m a bigger deal than Llama. Maybe I should have a word with Meta?

The complete step-by-step guide is provided here: https://www.youtube.com/watch?v=bxacSRGnn0I

Source code for the models is available here: https://github.com/okube-ai/okube-samples/blob/main/003-chatabricks/build_chatabricks.py

BYOM (Bring your own Model)

There are a few foundational models available by default, like Llama 3.1, DBRX, and Mixtral. You can even create new serving endpoints from services like OpenAI, PaLM, and AI21 Labs.

But what if I want to bring my own secret flavor of one of these foundational models into the playground? Or maybe use a snazzy RAG-based agent? Well, Databricks documentation goes mysteriously silent on that topic—or at least it's not exactly shouting it from the rooftops. And let’s be honest, a quick Google search didn’t exactly save the day either. So... is it even possible?

When faced with doubt and mystery, we do what we always do: we roll up our sleeves and give it a shot!

Build a Chain

First things first, let’s build a chain (LLM + prompt instructions) using Langchain, a framework that helps developers create applications by integrating language models with external tools, data, and workflows. Langchain allows for customizable chains and prompt management, making it perfect for this task.

For simplicity, we defined the LLM using the ChatDatabricks class, which lets us select any model already deployed within Databricks. Of course, OpenAI or any other API could also have been used. In this example, the chain simply feeds a custom prompt into this LLM. The prompt instructs the LLM to act as a Databricks certification assistant. While this is a very basic use case, you could easily create more complex, agent-based chains integrating multiple tools.

Once the chain is set up, you can query it using the invoke command. And when you do, you might get something like:

Register a Model

Now that we have our chain, the next step is to build a model using MLflow and register it with Unity Catalog. The model definition is pretty straightforward—just subclass the mlflow.pyfunc.PythonModel class, with the predict method returning the output from our chain. We also print the type of model_input to highlight some of the default behavior.

Next, we configure MLflow to log the model into Unity Catalog.

And now, we log and create the actual model:

Notice that the python_model is an instance of the previously created ChatDatabricks class, and we need to explicitly define a model signature. At this point, our model is registered and ready to go! We can confirm by peaking at the registered models page:

Using the model is now as easy as loading it from its URI and calling the predict method with some input:

Which yield the result:

One thing to note here—see how the predict method received a DataFrame-type object as input? This can be a bit confusing, especially for those just starting with MLflow. When you call model.predict, it doesn’t directly trigger the predict method you’ve defined. Instead, MLflow does some internal formatting on the model_input argument, and the context is automatically built for you. The DataFrame type is perfect when you want to run multiple predictions across multiple inputs. For chat models, it's not particularly well suited, but we can work around it by transforming the inputs to match our needs.

Service Endpoint (twice)

Alright! So, we have a registered model. That means we should be able to call it from the Playground, right? Well… not quite yet! First, we need to head over to the Serving panel and create an endpoint for our model. This allows us to call the model from outside Databricks through a REST API. When selecting the served identity, make sure to pick the chatabricks1 model that we just registered in Unity Catalog.

So far, so good. We now have our model serviced just like any other foundational model provided by Databricks.

You might think we’re ready to go, but hold your horses! As you may have noticed, the Task of our model is not set to "Chat," which means it won’t be selectable from the Playground.

Here comes the counterintuitive part: we need to register the model again. Yes, you read that right—we have to register it a second time, but with a slightly different approach. This time, we’ll select External Model with Databricks Model Serving as the provider. Then, set the workspace URL to match the current workspace we’re using and provide a valid token in the API secret field. Most importantly, this time we’ll be able to select "Chat" as the Task.

In other words, we created the first serving endpoint to make our chatabricks1 model queryable via REST API, and the second one explicitly as a chat model that references the REST call of the first. I think I get it. According to the documentation models serving chat need to handle scoring requests at POST /serving-endpoints/{name}/invocations, but honestly, I was expecting a simpler solution from Databricks—or at least clearer instructions in the docs.

Anyway, we finally have access to chatabricks1 in the Playground!

Surprise, surprise—it still doesn’t work. ?? From the raised error, it’s hard to pinpoint the issue, but here’s what I discovered: a Chat Task expects a very strict format for both the model’s input and output. Unfortunately, chatabricks1 doesn’t meet those requirements.

Looks like we’ll need to go back and rebuild the model with the correct input/output structure.

Rebuild Model

Given that we have full control over the model signature when logging a model, we should be able to tailor it to fit the requirements of a chat model. However, there’s a simpler option. Instead of deriving from the mlflow.pyfunc.PythonModel class, we’ll use the mlflow.pyfunc.ChatModel class.

With this class, the signature is implicit, matches the OpenAI standard, and—here’s the kicker—can’t be overwritten. Of course, we had to slightly tweak our predict method to properly handle the input format and send the expected output format. This new model has been registered as chatabricks2 in Unity Catalog. With this new model, we follow the exact same process: create two serving endpoints, and then head back to the Playground!

Success! Well… partial success. It works, but there’s a quirk: you can’t set the temperature to 0 because it’s interpreted as an integer by the UI, which conflicts with the expected float type. The workaround? Set the temperature to a very small non-zero value, or just deactivate it completely.

Summary

We’ve successfully created a custom chat model using MLflow and integrated it into the Databricks AI Playground. Once you know the exact recipe, the process is fairly straightforward and can be completed in under an hour. However, the need to create two serving endpoints is far from intuitive, and the strict formatting requirements are woefully under-documented. I really hope Databricks improves on this in the future, as the AI Playground is an incredibly useful tool for testing LLM models and sharing them with teams.

What about you? Have you used Databricks AI Playground yet? Have you discovered a simpler solution for integrating your own model?

And don't forget to follow me for more great content about data and ML engineering.

Olivier Soucy

Founder @ okube.ai | Fractional Data Platform Engineer | Open-source Developer | Databricks Partner

7 个月

I would be curious to know if some MLOps experts such as Maria Vechtomova have a simpler solution to this!

要查看或添加评论，请登录

Olivier Soucy的更多文章

Lakehouse as Code | 04. Delta Live Tables Data Pipelines

2024年10月15日

Lakehouse as Code | 04. Delta Live Tables Data Pipelines

Welcome to the Lakehouse as Code mini-series! In this series, we'll walk you through deploying a complete Databricks…

1 条评论
Lakehouse as Code | 03. Data Pipeline Jobs

2024年10月8日

Lakehouse as Code | 03. Data Pipeline Jobs

Welcome to the Lakehouse as Code mini-series! In this series, we'll walk you through deploying a complete Databricks…
Lakehouse as Code | 02. Workspace

2024年9月30日

Lakehouse as Code | 02. Workspace

Welcome to the Lakehouse as Code mini-series! In this series, we'll walk you through deploying a complete Databricks…
Lakehouse as Code | 01. Unity Catalog

2024年9月24日

Lakehouse as Code | 01. Unity Catalog

Welcome to the Lakehouse as Code mini-series! In this series, we'll walk you through deploying a complete Databricks…
Data Pipelines | To merge, or not to merge

2024年9月10日

Data Pipelines | To merge, or not to merge

In recent years, data has shifted towards a more streaming-centric nature. Online transactions, website clicks, TikTok…
Unity Catalog | 3 levels to rule them all

2024年9月3日

Unity Catalog | 3 levels to rule them all

In May 2021, Databricks introduced Unity Catalog (UC), promising a unified governance layer designed to streamline the…

5 条评论
Building a Data Pipeline with Polars and Laktory

2024年7月8日

Building a Data Pipeline with Polars and Laktory

When discussing data pipelines, distributed engines like Spark and big data platforms such as Databricks and Snowflake…

1 条评论
DataFrames Battle Royale | Pandas vs Polars vs Spark

2024年5月31日

DataFrames Battle Royale | Pandas vs Polars vs Spark

Usually, when I sit down to write these blog posts, I have a clear direction in mind. This time, however, it's a bit…

4 条评论
Analytics for Everyone | Data driven decisions using ChatGPT

2024年5月21日

Analytics for Everyone | Data driven decisions using ChatGPT

Preface: If you are non-technical but have an interest in technology or data-driven decision-making, please keep…

2 条评论
Mastering Streaming Data Pipelines with Kappa Architecture

2024年5月16日

Mastering Streaming Data Pipelines with Kappa Architecture

These days, experience with streaming data is a common requirement in most data engineering job postings. It seems that…

2 条评论

See all articles

BYOM (Bring your own Model)

Build a Chain

Register a Model

Service Endpoint (twice)

Rebuild Model

Summary

Olivier Soucy的更多文章

Lakehouse as Code | 04. Delta Live Tables Data Pipelines

Lakehouse as Code | 03. Data Pipeline Jobs

Lakehouse as Code | 02. Workspace

Lakehouse as Code | 01. Unity Catalog

Data Pipelines | To merge, or not to merge

Unity Catalog | 3 levels to rule them all

Building a Data Pipeline with Polars and Laktory

DataFrames Battle Royale | Pandas vs Polars vs Spark

Analytics for Everyone | Data driven decisions using ChatGPT

Mastering Streaming Data Pipelines with Kappa Architecture