登录查看更多内容

Week of September 16th

Stefan Krawczyk

CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs

发布日期: 2024年9月19日

+ 关注

TL;DR:

Social Proof: Don't take my word for it, take theirs.
#Hamilton release highlights: OS contributed Pydantic data validator, & new function overrides capability.
#Burr release highlights: Pydantic schema for state
Blog on OpenLineage integration with Hamilton
Office Hours & Meet ups for Hamilton & Burr.

Social Proof: Don't take my word for it, take theirs.

It's been a fun past week asking/collecting user for quotes on their experience using Hamilton and Burr. If you're a lurker and haven't yet tried what we're building, there's never been a better time to start than now... Don't take my word for it - take theirs! Here's some teasers:

[...] I felt trapped in LangChain's ecosystem [...] Moving from LangChain to Burr was a game-changer.

It took me just a few hours to get started with Burr, compared to the days and weeks I spent trying to navigate LangChain.

With Burr, I could finally have a cleaner, more sophisticated, and stable implementation. No more wrestling with complex codebases.

I pitched Burr to my teammates, and we pivoted our entire codebase to it. It's been a smooth ride ever since.

Hamilton is simplicity. Its declarative approach to defining pipelines (as well as the UI to visualize them) makes testing and modifying the code easy, and onboarding is quick and painless. Since using Hamilton, we have improved our efficiency of both developing new functionality and onboarding new developers to work on the code. We deliver solutions more quickly than before.

We're active users of Hamilton. We have found it very useful in standardizing feature engineering code in our production code bases. It's particularly useful in diagnosing data contamination issues, and it's used by all of our MLEs

Of course, you can use it [LangChain], but whether it's really production-ready and improves the time from "code-to-prod" [...], we've been doing LLM apps for two years, and the answer is no [...] All these "all-in-one" libs suffer from this [...]? Honestly, take a look at Burr. Thank me later.

How (with good software practices) do you orchestrate a system of asynchronous LLM calls, but where some of them depend on others? How do you build such a system so that it’s modular and testable? At [REDACTED] we’ve selected Hamilton to help us solve these problems and others. And today our product, [REDACTED], an AI legal assistant that extracts information from estate planning documents, is running in production with Hamilton under the hood.

Hamilton Release Highlights:

Hamilton Framework == 1.77.0

Pydantic data validators

Pydantic is a common library used to describe data records, especially in web contexts. With this latest release you can now validate any pydantic model/dict with model contents against a schema using the check_output decorator:

from hamilton.function_modifiers import check_output

class MyModel(BaseModel):
    name: str

@check_output(model=MyModel)
def foo() -> dict:
    return {"name": "hamilton"}

# or
from hamilton.plugins import h_pydantic

@h_pydantic.check_output()
def foo() -> MyModel:
    return MyModel(name="hamilton")

For those unfamiliar with the check_output decorator, it is a lightweight way to incorporate data quality with Hamilton, and this Pydantic integration complements our Pandera integration. Thanks to Charles Schwartz for their second contribution to the project!

New function overrides

When creating multiple Hamilton modules and then using them together, there are situations where the @config.when decorator usage can be a little too verbose when you want to replace logic depending on what modules you are using together. Thanks to Jernej Frank , you now have another option (shout out to Yijun Tang for the idea).

Here's how it works - say you have two modules, where one function is redefined in the other:

# module_a.py
def foo() -> int:
    return 1

def bar() -> int:
    return 2

# module_b.py
def bar() -> int:
    return 3

Rather than doing @config.when to choose the right one, we can just tell Hamilton (via .all_module_overrides()) to take the last definition we come across:

import module_1, module_b

dr = (
    driver
    .Builder()
    .with_modules(module_a, module_b) # order matters!
    .allow_module_overrides() # < --- this is required for it to work
    .build()
)

print(dr.execute(['foo', 'bar']))
{
    "foo" : 1,
    "bar" : 3
}

Hamilton SDK == 0.7.2

??Fix: We pushed a few Polars fixes so summary statistics work appropriately when logging to the Hamilton UI.

Hamilton UI == 0.0.15

??Fix: if quantile values are None or empty, the UI now correctly handles them.

Burr Release Highlights

Burr == 0.30.1

Typed State with Pydantic (docs reference here, example here)

领英推荐

List and LazyVStack in SwiftUI

Treinetic 2 个月前

Understanding Node.js Memory Leaks: A Simple Guide for…

Centizen, Inc. 6 个月前

LeetCode, Hard: 2818. Apply Operations to Maximize…

Siarhei Liashchou 1 年前

Burr now has two approaches to specifying a schema for a state. These can work together as long as they specify clashing state:

Application-level typing
Action-level typing

These enable a host of other extensions/capabilities.

While the current implementation only supports Pydantic, the typing system is intended to be pluggable, and we plan to add further integrations (dataclasses, typed dicts, etc…).

The TL;DR: is that you can now do something like this at the application level:

First, define a Pydantic model for your application:

from pydantic import BaseModel
class ApplicationState(pydantic.BaseModel):
    chat_history: List[dict[str, str]] = pydantic.Field(default_factory=list)
    prompt: Optional[str] = None
    mode: Optional[Literal["text", "image"]] = None
    response: Optional[dict[str, str]] = None

Then, we can use this model to type our application:

from burr import ApplicationBuilder
from burr.core.typing import PydanticTypingSystem

app = (
    ApplicationBuilder()
    .with_actions(...)
    .with_entrypoint(...)
    .with_transitions(...)
    .with_typing(PydanticTypingSystem(ApplicationState))
    .with_state(ApplicationState())
    .build()
)

Your application is now typed with that pydantic model. If you’re using an appropriate typing integration in your IDE (E.G. pylance), it will know that the state of your application is of type MyApplicationState.

When you have this you’ll be able to run:

action_ran, result, state = app.run(inputs=...)
state.data # of type ApplicationState -- do what you want with this!

For action level state:

You can also define type computations on on the action-level:

@action.pydantic(reads=["prompt", "chat_history"], writes=["response"])
def image_response(state: ApplicationState, 
                                     model: str = "dall-e-2") -> ApplicationState:
    client = _get_openai_client()
    result = client.images.generate(
        model=model, prompt=state.prompt, 
        size="1024x1024", quality="standard", n=1
    )
    response = result.data[0].url
    state.response = {"content": response, 
                                   "type": MODES[state.mode], 
                                   "role": "assistant"}
    return state

Note three interesting choices here:

The state is typed as a Pydantic model
The return type is the same Pydantic model
We mutate the state in place, rather than returning a new state

This is a different action API – it effectively subsets the (global) state on input, gives you that Pydantic object, then subsets the state on output, and merges it back.

Thus if you try to refer to a state variable that you didn’t specify in the reads/writes, it will give an error.

Mutating in place is OK as this produces a new object for each execution run. For now, you will want to be careful about lists/list pointers – we are working on that.

For more details see documentation here, and the example here.

New Blog

Title: Hamilton supports OpenLineage

This post follows last week's OpenLineage meetup where we presented Hamilton and it's new integration with OpenLineage. This blog post goes over what OpenLineage is, why you might want to use it, and then how Hamilton now emits OpenLineage events. For example to get more visibility into your Airflow jobs that run python, consider using Hamilton to help organize that and then with OpenLineage provide data lineage for those tasks!

Office Hours & Meetup

Hamilton Meet up: We'll have the next meet up in October. Currently it is scheduled for October 15th. We're excited to have Sholto Armstrong talk about their use of Hamilton and the new library they built on top at Capitec . Join/sign-up here.

Hamilton Office Hours: They happen most Tuesday 9:30am PT - 10:30am PT. Join our slack for the link.

Burr Office Hours: They happen most Wednesdays 9:30am PT - 10:3am PT. Join our discord for the weekly link.

Stefan's Update

896 位关注者

Raja Rajavel

Founder, Lattek

6 个月

Voice of the user is sweet music!

要查看或添加评论，请登录

Stefan Krawczyk的更多文章

February Updates

2025年2月20日

February Updates

TL;DR: #Hamilton highlights: crossed 2000 github stars, released multithreading based DAG parallelism, RichProgressBar…

3 条评论
Last week of 2024 / first week of 2025

2025年1月2日

Last week of 2024 / first week of 2025

TL;DR: #Hamilton + #Burr 2024 stats: 35M+ telemetry events (10x), 100K+ unique IPs (10x) from 1000+ companies, 1M+…

3 条评论
Week of December 9th

2024年12月13日

Week of December 9th

TL;DR: #Hamilton release highlights: Better TypedDict support and modular subdag example Office Hours & Meet ups for…
Week of December 2nd

2024年12月5日

Week of December 2nd

TL;DR: #Hamilton release highlights: Async Datadog Integration, Polars & Pandas with_columns support. #Burr release…
Week of November 18th

2024年11月22日

Week of November 18th

TL;DR: #Hamilton release highlights: SDK configurability #Burr release highlights: parallelism UI modifications, video…
Week of November 11th

2024年11月15日

Week of November 11th

TL;DR: #Hamilton release highlights: async support for @pipe + various small fixes #Burr release highlights:…
Week of November 4th

2024年11月8日

Week of November 4th

TL;DR: #Hamilton release highlights: @with_columns decorator for Pandas by Jernej Frank & module overrides for async…
Week of October 28th

2024年10月31日

Week of October 28th

TL;DR: #Hamilton release highlights: in-memory cache store. #Burr release highlights: release candidate for a first…
Week of October 21st

2024年10月24日

Week of October 21st

TL;DR: #Hamilton release highlights: some minor fixes and docs updates from five different OS contributors! Also…
Week of October 14th

2024年10月17日

Week of October 14th

TL;DR: Announcing Shreya Shankar as an advisor. #Hamilton release highlights: tweaks to pipe_input, new…

3 条评论

See all articles

Week of September 16th

Stefan Krawczyk

CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs

TL;DR:

Social Proof: Don't take my word for it, take theirs.

Hamilton Release Highlights:

Hamilton Framework == 1.77.0

Hamilton SDK == 0.7.2

Hamilton UI == 0.0.15

Burr Release Highlights

Burr == 0.30.1

领英推荐

New Blog

Office Hours & Meetup

Stefan's Update

896 位关注者

Stefan Krawczyk的更多文章

社区洞察

其他会员也浏览了

Announcing Updates to FiftyOne 0.22.2 and FiftyOne Teams 1.4.3

Building a Local Question-Answering System with LangChain and Llama3.2:

Introducing Lambda Functions Using Google's Go

Data is King, Full-Stack Developers Wear the Crown: The Unspoken Truth of Marketing's Future

Langchain: Not Production-Ready? Here's Why You Should Reconsider

Moonly weekly progress update #58 - key features of Automatio

Extend functionality of LCNC

Managing State in Jetpack Compose: Individual MutableStates vs. Single MutableState

Comparison Between Polling and Webhooks in the World of APIs

Critter Stack Roadmap for 2025

TL;DR:

Social Proof: Don't take my word for it, take theirs.

Hamilton Release Highlights:

Hamilton Framework == 1.77.0

Hamilton SDK == 0.7.2

Hamilton UI == 0.0.15

Burr Release Highlights

Burr == 0.30.1

领英推荐

New Blog

Office Hours & Meetup

Stefan's Update

896 位关注者

Stefan Krawczyk的更多文章

February Updates

Last week of 2024 / first week of 2025

Week of December 9th

Week of December 2nd

Week of November 18th

Week of November 11th

Week of November 4th

Week of October 28th

Week of October 21st

Week of October 14th

社区洞察

其他会员也浏览了

Announcing Updates to FiftyOne 0.22.2 and FiftyOne Teams 1.4.3

Building a Local Question-Answering System with LangChain and Llama3.2:

Introducing Lambda Functions Using Google's Go

Data is King, Full-Stack Developers Wear the Crown: The Unspoken Truth of Marketing's Future

Langchain: Not Production-Ready? Here's Why You Should Reconsider

Moonly weekly progress update #58 - key features of Automatio

Extend functionality of LCNC

Managing State in Jetpack Compose: Individual MutableStates vs. Single MutableState

Comparison Between Polling and Webhooks in the World of APIs

Critter Stack Roadmap for 2025