Week of October 7th
Stefan Krawczyk
CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs
TL;DR:
Hamilton Release Highlights:
Hamilton Framework == 1.80.0
New @mutate decorator!
With Hamilton part of the UX is that you need to think about how you construct your dataflow, or DAG, and in particular how you name "assets", i.e. things you can compute. This is how Hamilton stitches computation together and is also a forcing function to help your separate out logic where it makes sense. In general this is useful because it helps you clearly identify when logical changes occur (e.g. raw_data -> transformed_dataset, etc), but in some circumstances this can lead to code that's less ergonomically friendly. To help in the latter situation Jernej Frank added @mutate.
What this does is enable you to not have to rename an "asset", i.e. a function/node in the dataflow. As the decorator indicates, it mutates it instead. This means you can keep a single name, e.g. transformed_data, and append "mutations" to it. See code snippet below:
from hamilton.function_modifiers import mutate
import pandas as pd
def transformed_data(raw_data: pd.DataFrame) -> pd.DataFrame:
return ... # do your regular stuff here
@mutate(transformed_data)
def _normalize_columns(df: pd.DataFrame) -> pd.DataFrame:
"""normalizes data"""
for column in df.columns:
df[column] = (df[column]-df[column].min())/(df[column].max() - df[column].min())
return df
@mutate(transformed_data, outlier_threshold=10)
def _remove_outliers(df: pd.DataFrame, outlier_threshold: float) -> pd.DataFrame:
"""Removes outliers"""
return df[df < outlier_threshold]
This allows you to quickly and easily develop without having to "rewire" your graph. E.g. comment out a function, rerun the dataflow, add a function and add @mutate, rerun the dataflow without issue.
This is how Hamilton UI could visualize it:
Updated Slack Notifier Formatting
Hamilton has a Slack Notifier plugin that will send errors to slack. We upgraded the formatting and now things look like this:
Burr Release Highlights
Burr == 0.31.0
Annotations Version 1.0
We're super excited to have added annotation capabilities to the UI!
The goal -- make it so you can label, annotate, and collect data for review/evaluation
The workflow:
See the following screenshots:
We're super excited for this first version. We'd love feedback on how to make it work better for your particular workflow. For example, what is clunky, through to what would you do next? Do you want to connect this with evaluations? Would something in the UI help? or would more export functionality be better?
Office Hours & Meetup
Hamilton Meet up: Our meet-up next week. We're excited to have Sholto Armstrong talk about their use of Hamilton at Capitec - talk title & abstract below. Join/sign-up here to watch it live.
Title: Building a Decisioning Engine for Data Scientists with Hamilton.
In this talk, we will share our experience of leveraging Hamilton to build a comprehensive decisioning engine. We will explore the approach we took to build custom components on top of Hamilton, including decision trees, credit risk scorecards, and decision tables. The talk will walk you through practical examples from our work in finance, showcasing how we used our Hamilton-based decisioning engine to tackle challenges like fraud detection and credit granting. Throughout the talk, we will highlight the challenges we faced and how Hamilton was able to solve these issues. We will also explore the practical insights we gained and the best practices we developed for building robust, scalable, and explainable decisioning engines with Hamilton.
领英推荐
--
At the meet-up, we'll also cover some new features that we've recently released.
Hamilton Office Hours: They happen most Tuesday 9:30am PT - 10:30am PT.
Join our slack for the link.
Burr Office Hours: They happen most Wednesdays 9:30am PT - 10:3am PT.
Join our discord for the weekly link.
MLOps World & Generative AI World Summit 2024
This November is the annual MLOps & Generative AI World summit . It's in Austin, Texas. I went last year and had a great series of conversations with practitioners. If you can make it, I'd recommend attending.
For those that don't know, the goal of the summit/conference, organized by the Toronto Machine Learning Society (TMLS) , is to help companies put more machine learning and AI into production environments, effectively, responsibly, and efficiently.
Whether you're working towards a live production deployment, or currently working in production, this is a conference geared towards the gathering of like minded individuals to help share practical knowledge to help you on your journey.
Some of the talk tracks this year:
GenAI for SWEs Workshop
Together with Hugo Bowne-Anderson I will be hosting a workshop for software engineers on some first principles for delivering GenAI applications. More details to follow.
I'll also be running a community table on Hamilton & Burr, and "reliable AI" best practices.
Discount for Passes
If you'd like to attend, you can use the code DAGWORKS150 to get $150 off all passes . If you're going, send me a note, I'd love to meet-up.
Conference Details
When: 9AM ET on Thursday, November 7th to 5PM ET on Friday, November 8th 2024 Where: Renaissance Austin Hotel, 9721 Arboretum Boulevard, Austin, TX. MAP.
Need more convincing? Watch this video .
New Blog post:
Following up from last week's caching released. We've added a companion blog post that explains the feature in more detail. We invite you to try the feature, feedback has been quite positive thus far, and we're looking for more feedback to continue to improve the functionality.
In the Wild:
We saw a post from Yuki Kakegawa on LinkedIn this past week. We're always excited for people coming across Hamilton -- and seeing our adoption span data engineers, machine learning engineers, through to GenAI engineers. This speaks to the universality of our tooling -- which is not an easy thing to replicate!
There were also some great comments too, like this one: