Week of September 16th
Stefan Krawczyk
CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs
TL;DR:
Social Proof: Don't take my word for it, take theirs.
It's been a fun past week asking/collecting user for quotes on their experience using Hamilton and Burr. If you're a lurker and haven't yet tried what we're building, there's never been a better time to start than now... Don't take my word for it - take theirs! Here's some teasers:
[...] I felt trapped in LangChain's ecosystem [...] Moving from LangChain to Burr was a game-changer.
It took me just a few hours to get started with Burr, compared to the days and weeks I spent trying to navigate LangChain.
With Burr, I could finally have a cleaner, more sophisticated, and stable implementation. No more wrestling with complex codebases.
I pitched Burr to my teammates, and we pivoted our entire codebase to it. It's been a smooth ride ever since.
Hamilton is simplicity. Its declarative approach to defining pipelines (as well as the UI to visualize them) makes testing and modifying the code easy, and onboarding is quick and painless. Since using Hamilton, we have improved our efficiency of both developing new functionality and onboarding new developers to work on the code. We deliver solutions more quickly than before.
We're active users of Hamilton. We have found it very useful in standardizing feature engineering code in our production code bases. It's particularly useful in diagnosing data contamination issues, and it's used by all of our MLEs
Of course, you can use it [LangChain], but whether it's really production-ready and improves the time from "code-to-prod" [...], we've been doing LLM apps for two years, and the answer is no [...] All these "all-in-one" libs suffer from this [...]? Honestly, take a look at Burr. Thank me later.
How (with good software practices) do you orchestrate a system of asynchronous LLM calls, but where some of them depend on others? How do you build such a system so that it’s modular and testable? At [REDACTED] we’ve selected Hamilton to help us solve these problems and others. And today our product, [REDACTED], an AI legal assistant that extracts information from estate planning documents, is running in production with Hamilton under the hood.
Hamilton Release Highlights:
Hamilton Framework == 1.77.0
Pydantic data validators
Pydantic is a common library used to describe data records, especially in web contexts. With this latest release you can now validate any pydantic model/dict with model contents against a schema using the check_output decorator:
from hamilton.function_modifiers import check_output
class MyModel(BaseModel):
name: str
@check_output(model=MyModel)
def foo() -> dict:
return {"name": "hamilton"}
# or
from hamilton.plugins import h_pydantic
@h_pydantic.check_output()
def foo() -> MyModel:
return MyModel(name="hamilton")
For those unfamiliar with the check_output decorator, it is a lightweight way to incorporate data quality with Hamilton, and this Pydantic integration complements our Pandera integration. Thanks to Charles Schwartz for their second contribution to the project!
New function overrides
When creating multiple Hamilton modules and then using them together, there are situations where the @config.when decorator usage can be a little too verbose when you want to replace logic depending on what modules you are using together. Thanks to Jernej Frank , you now have another option (shout out to Yijun Tang for the idea).
Here's how it works - say you have two modules, where one function is redefined in the other:
# module_a.py
def foo() -> int:
return 1
def bar() -> int:
return 2
# module_b.py
def bar() -> int:
return 3
Rather than doing @config.when to choose the right one, we can just tell Hamilton (via .all_module_overrides()) to take the last definition we come across:
import module_1, module_b
dr = (
driver
.Builder()
.with_modules(module_a, module_b) # order matters!
.allow_module_overrides() # < --- this is required for it to work
.build()
)
print(dr.execute(['foo', 'bar']))
{
"foo" : 1,
"bar" : 3
}
Hamilton SDK == 0.7.2
??Fix: We pushed a few Polars fixes so summary statistics work appropriately when logging to the Hamilton UI.
Hamilton UI == 0.0.15
??Fix: if quantile values are None or empty, the UI now correctly handles them.
Burr Release Highlights
Burr == 0.30.1
Typed State with Pydantic (docs reference here, example here)
领英推荐
Burr now has two approaches to specifying a schema for a state. These can work together as long as they specify clashing state:
These enable a host of other extensions/capabilities.
While the current implementation only supports Pydantic, the typing system is intended to be pluggable, and we plan to add further integrations (dataclasses, typed dicts, etc…).
The TL;DR: is that you can now do something like this at the application level:
First, define a Pydantic model for your application:
from pydantic import BaseModel
class ApplicationState(pydantic.BaseModel):
chat_history: List[dict[str, str]] = pydantic.Field(default_factory=list)
prompt: Optional[str] = None
mode: Optional[Literal["text", "image"]] = None
response: Optional[dict[str, str]] = None
Then, we can use this model to type our application:
from burr import ApplicationBuilder
from burr.core.typing import PydanticTypingSystem
app = (
ApplicationBuilder()
.with_actions(...)
.with_entrypoint(...)
.with_transitions(...)
.with_typing(PydanticTypingSystem(ApplicationState))
.with_state(ApplicationState())
.build()
)
Your application is now typed with that pydantic model. If you’re using an appropriate typing integration in your IDE (E.G. pylance), it will know that the state of your application is of type MyApplicationState.
When you have this you’ll be able to run:
action_ran, result, state = app.run(inputs=...)
state.data # of type ApplicationState -- do what you want with this!
For action level state:
You can also define type computations on on the action-level:
@action.pydantic(reads=["prompt", "chat_history"], writes=["response"])
def image_response(state: ApplicationState,
model: str = "dall-e-2") -> ApplicationState:
client = _get_openai_client()
result = client.images.generate(
model=model, prompt=state.prompt,
size="1024x1024", quality="standard", n=1
)
response = result.data[0].url
state.response = {"content": response,
"type": MODES[state.mode],
"role": "assistant"}
return state
Note three interesting choices here:
This is a different action API – it effectively subsets the (global) state on input, gives you that Pydantic object, then subsets the state on output, and merges it back.
Thus if you try to refer to a state variable that you didn’t specify in the reads/writes, it will give an error.
Mutating in place is OK as this produces a new object for each execution run. For now, you will want to be careful about lists/list pointers – we are working on that.
For more details see documentation here, and the example here.
New Blog
This post follows last week's OpenLineage meetup where we presented Hamilton and it's new integration with OpenLineage. This blog post goes over what OpenLineage is, why you might want to use it, and then how Hamilton now emits OpenLineage events. For example to get more visibility into your Airflow jobs that run python, consider using Hamilton to help organize that and then with OpenLineage provide data lineage for those tasks!
Office Hours & Meetup
Hamilton Meet up: We'll have the next meet up in October. Currently it is scheduled for October 15th. We're excited to have Sholto Armstrong talk about their use of Hamilton and the new library they built on top at Capitec . Join/sign-up here.
Hamilton Office Hours: They happen most Tuesday 9:30am PT - 10:30am PT. Join our slack for the link.
Burr Office Hours: They happen most Wednesdays 9:30am PT - 10:3am PT. Join our discord for the weekly link.
Founder, Lattek
6 个月Voice of the user is sweet music!