Databricks AI Playground | How to bring your own model

Databricks AI Playground | How to bring your own model

After a few months in public preview, Databricks AI Playground has garnered great feedback from the community. But if you’ve been living under a rock (or, shall we say, a brick) and have no idea what it’s all about, check out this short video by Holly Smith: https://www.youtube.com/shorts/pNA-YYLBJH4

In essence, this playground is a chat-like environment where you can test, prompt, and compare LLMs. And because a picture is worth a thousand words, here’s a little snapshot:

Comparing LLM with Databricks AI Playground

It looks like DBRX thinks I’m a bigger deal than Llama. Maybe I should have a word with Meta?

The complete step-by-step guide is provided here: https://www.youtube.com/watch?v=bxacSRGnn0I

Source code for the models is available here: https://github.com/okube-ai/okube-samples/blob/main/003-chatabricks/build_chatabricks.py

BYOM (Bring your own Model)

There are a few foundational models available by default, like Llama 3.1, DBRX, and Mixtral. You can even create new serving endpoints from services like OpenAI, PaLM, and AI21 Labs.

Default foundational models

But what if I want to bring my own secret flavor of one of these foundational models into the playground? Or maybe use a snazzy RAG-based agent? Well, Databricks documentation goes mysteriously silent on that topic—or at least it's not exactly shouting it from the rooftops. And let’s be honest, a quick Google search didn’t exactly save the day either. So... is it even possible?

When faced with doubt and mystery, we do what we always do: we roll up our sleeves and give it a shot!

Build a Chain

First things first, let’s build a chain (LLM + prompt instructions) using Langchain, a framework that helps developers create applications by integrating language models with external tools, data, and workflows. Langchain allows for customizable chains and prompt management, making it perfect for this task.

For simplicity, we defined the LLM using the ChatDatabricks class, which lets us select any model already deployed within Databricks. Of course, OpenAI or any other API could also have been used. In this example, the chain simply feeds a custom prompt into this LLM. The prompt instructs the LLM to act as a Databricks certification assistant. While this is a very basic use case, you could easily create more complex, agent-based chains integrating multiple tools.

Once the chain is set up, you can query it using the invoke command. And when you do, you might get something like:


Register a Model

Now that we have our chain, the next step is to build a model using MLflow and register it with Unity Catalog. The model definition is pretty straightforward—just subclass the mlflow.pyfunc.PythonModel class, with the predict method returning the output from our chain. We also print the type of model_input to highlight some of the default behavior.

Next, we configure MLflow to log the model into Unity Catalog.

And now, we log and create the actual model:

Notice that the python_model is an instance of the previously created ChatDatabricks class, and we need to explicitly define a model signature. At this point, our model is registered and ready to go! We can confirm by peaking at the registered models page:

Using the model is now as easy as loading it from its URI and calling the predict method with some input:

Which yield the result:

One thing to note here—see how the predict method received a DataFrame-type object as input? This can be a bit confusing, especially for those just starting with MLflow. When you call model.predict, it doesn’t directly trigger the predict method you’ve defined. Instead, MLflow does some internal formatting on the model_input argument, and the context is automatically built for you. The DataFrame type is perfect when you want to run multiple predictions across multiple inputs. For chat models, it's not particularly well suited, but we can work around it by transforming the inputs to match our needs.

Service Endpoint (twice)

Alright! So, we have a registered model. That means we should be able to call it from the Playground, right? Well… not quite yet! First, we need to head over to the Serving panel and create an endpoint for our model. This allows us to call the model from outside Databricks through a REST API. When selecting the served identity, make sure to pick the chatabricks1 model that we just registered in Unity Catalog.

So far, so good. We now have our model serviced just like any other foundational model provided by Databricks.

You might think we’re ready to go, but hold your horses! As you may have noticed, the Task of our model is not set to "Chat," which means it won’t be selectable from the Playground.

Here comes the counterintuitive part: we need to register the model again. Yes, you read that right—we have to register it a second time, but with a slightly different approach. This time, we’ll select External Model with Databricks Model Serving as the provider. Then, set the workspace URL to match the current workspace we’re using and provide a valid token in the API secret field. Most importantly, this time we’ll be able to select "Chat" as the Task.

In other words, we created the first serving endpoint to make our chatabricks1 model queryable via REST API, and the second one explicitly as a chat model that references the REST call of the first. I think I get it. According to the documentation models serving chat need to handle scoring requests at POST /serving-endpoints/{name}/invocations, but honestly, I was expecting a simpler solution from Databricks—or at least clearer instructions in the docs.

Anyway, we finally have access to chatabricks1 in the Playground!

Surprise, surprise—it still doesn’t work. ?? From the raised error, it’s hard to pinpoint the issue, but here’s what I discovered: a Chat Task expects a very strict format for both the model’s input and output. Unfortunately, chatabricks1 doesn’t meet those requirements.

Looks like we’ll need to go back and rebuild the model with the correct input/output structure.

Rebuild Model

Given that we have full control over the model signature when logging a model, we should be able to tailor it to fit the requirements of a chat model. However, there’s a simpler option. Instead of deriving from the mlflow.pyfunc.PythonModel class, we’ll use the mlflow.pyfunc.ChatModel class.

With this class, the signature is implicit, matches the OpenAI standard, and—here’s the kicker—can’t be overwritten. Of course, we had to slightly tweak our predict method to properly handle the input format and send the expected output format. This new model has been registered as chatabricks2 in Unity Catalog. With this new model, we follow the exact same process: create two serving endpoints, and then head back to the Playground!

Success! Well… partial success. It works, but there’s a quirk: you can’t set the temperature to 0 because it’s interpreted as an integer by the UI, which conflicts with the expected float type. The workaround? Set the temperature to a very small non-zero value, or just deactivate it completely.

Summary

We’ve successfully created a custom chat model using MLflow and integrated it into the Databricks AI Playground. Once you know the exact recipe, the process is fairly straightforward and can be completed in under an hour. However, the need to create two serving endpoints is far from intuitive, and the strict formatting requirements are woefully under-documented. I really hope Databricks improves on this in the future, as the AI Playground is an incredibly useful tool for testing LLM models and sharing them with teams.

What about you? Have you used Databricks AI Playground yet? Have you discovered a simpler solution for integrating your own model?

And don't forget to follow me for more great content about data and ML engineering.


Olivier Soucy

Founder @ okube.ai | Fractional Data Platform Engineer | Open-source Developer | Databricks Partner

7 个月

I would be curious to know if some MLOps experts such as Maria Vechtomova have a simpler solution to this!

回复

要查看或添加评论,请登录

Olivier Soucy的更多文章