Week of November 4th
Stefan Krawczyk
CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs
TL;DR:
Hamilton Release Highlights:
Hamilton Framework == 1.83.0
@with_columns for Pandas
We're excited that Jernej Frank took this task on. If you're doing more traditional work with dataframes, you probably think more in dataframes than you do in columns. With the pandas @with_columns decorator (note: Hamilton's pyspark support already has @with_columns), you can now mix and match your granularity of thinking more ergonomically. Use dataframes for the main chunks, and then add column level operations where it makes sense. This replaces prior combinations of @subdag to achieve the same result. See the code below -- my_functions is a module that defines some Hamilton dataflows, that we then pull things from to populate columns in a pandas dataframe.
import pandas as pd
from hamilton.plugins.h_pandas import with_columns
import my_functions
def initial_df()->pd.DataFrame:
return pd.DataFrame.from_dict(
{
"signups": pd.Series([1, 10, 50, 100, 200, 400]),
"spend": pd.Series([10, 10, 20, 40, 40, 50])*1e6,
}
)
output_columns = [
"spend",
"signups",
"avg_3wk_spend",
"spend_per_signup",
"spend_zero_mean_unit_variance",
]
# the with_columns call ---- this is new!
@with_columns(
*[my_functions], # the "DAG" to use
columns_to_pass=["spend", "signups"], # select cols from the dataframe
select=output_columns, # The columns to append to the dataframe
# config_required = ["a"]
)
def final_df(initial_df: pd.DataFrame) -> pd.DataFrame:
# initial_df will now have the columns appended to it
return initial_df
This will then create a DAG that looks like the following:
Thanks Jernej Frank ! Async & Polars support forthcoming.
Async Driver now supports .allow_module_overrides()
What Ryan's change does, is allow one to pass in two modules, and allow the latter module to "override" node implementations in the first module. This reduces the need for @config.when annotations in your code.
Thanks to Ryan Whitten for picking this up and ensuring async support has parity with regular python support.
Burr Release Highlights
No releases this week. Stay tuned.
Office Hours & Meetup
Hamilton Meet up: Our next meet-up will be December. Want to present? Reach out. Otherwise join/sign-up here.
Hamilton Office Hours: They happen most Tuesday 9:30am PT - 10:30am PT.
Join our slack for the link.
Burr Office Hours: They happen most Wednesdays 9:30am PT - 10:3am PT.
Join our discord for the weekly link.
Running a Maven course on Building GenAI Applications
I'm excited to partner with Hugo Bowne-Anderson and Maven to build out a course to help ground the principles required to ship GenAI applications. You can enroll here.
But, we want to make sure it is relevant! So if you can spare 5-10 minutes, we'd love your feedback to help shape the course: https://maven.com/forms/b83ce4
In the Wild:
Burr at MLOps World & Generative AI World Summit
Hugo Bowne-Anderson and I have been running a workshop at MLOps World & Generative AI World Summit. We're excited to teach people to think from first principles and introduce a framework like Burr to help get to production.
Stop by the DAGWorks Inc. community stand if you're at the conference for a sticker.