Week of October 7th
Image by author.

Week of October 7th

TL;DR:

  • #Hamilton release highlights: @mutate decorator & updated slack notifier plugin
  • #Burr release highlights: V1 of annotations in the Burr UI
  • Office Hours & Meet ups for Hamilton & Burr.
  • MLOps World & Generative AI World Summit 2024
  • Blog post: Hamilton Caching feature write up
  • In the wild: Hamilton on LinkedIn


Hamilton Release Highlights:

Hamilton Framework == 1.80.0

New @mutate decorator!

With Hamilton part of the UX is that you need to think about how you construct your dataflow, or DAG, and in particular how you name "assets", i.e. things you can compute. This is how Hamilton stitches computation together and is also a forcing function to help your separate out logic where it makes sense. In general this is useful because it helps you clearly identify when logical changes occur (e.g. raw_data -> transformed_dataset, etc), but in some circumstances this can lead to code that's less ergonomically friendly. To help in the latter situation Jernej Frank added @mutate.

What this does is enable you to not have to rename an "asset", i.e. a function/node in the dataflow. As the decorator indicates, it mutates it instead. This means you can keep a single name, e.g. transformed_data, and append "mutations" to it. See code snippet below:

from hamilton.function_modifiers import mutate
import pandas as pd

def transformed_data(raw_data: pd.DataFrame) -> pd.DataFrame:
    return ... # do your regular stuff here

@mutate(transformed_data)
def _normalize_columns(df: pd.DataFrame) -> pd.DataFrame:
    """normalizes data"""
    for column in df.columns:
        df[column] = (df[column]-df[column].min())/(df[column].max() - df[column].min())
    return df


@mutate(transformed_data, outlier_threshold=10)
def _remove_outliers(df: pd.DataFrame, outlier_threshold: float) -> pd.DataFrame:
    """Removes outliers"""
    return df[df < outlier_threshold]        

This allows you to quickly and easily develop without having to "rewire" your graph. E.g. comment out a function, rerun the dataflow, add a function and add @mutate, rerun the dataflow without issue.

This is how Hamilton UI could visualize it:

Example Hamilton UI view. Graphviz view would be similar.


Updated Slack Notifier Formatting

Hamilton has a Slack Notifier plugin that will send errors to slack. We upgraded the formatting and now things look like this:

Slack Notifier output example

Burr Release Highlights

Burr == 0.31.0

Annotations Version 1.0

We're super excited to have added annotation capabilities to the UI!

The goal -- make it so you can label, annotate, and collect data for review/evaluation

The workflow:

  1. Run a burr app (preferably with OpenTelemetry integration/attribute logging)
  2. Go to it in the UI, click on the annotations (pencil Icon) column -- + will add an annotation, edit button will edit one. Tag it with anythign you want (E.G. hallucination, to-review, etc...)
  3. Add observations on the attributes -- these have / , free-form text, and the option to add a "ground truth". Add as many as you want
  4. View in the UI -- either attached to the action in the application view, or in a freestanding annotations view (linked from projects). Query by tags, action name, etc...
  5. Download the data -- click the download button in the annotations view to get a CSV with all relevant data

See the following screenshots:

With in an application trace (left hand side), you can tag & annotate state and open telemetry properties (right hand side).
You can then find all annotations for a particular tag for a project and dig into their context -- and export via a download button.
You can also edit and add ground truth etc.

We're super excited for this first version. We'd love feedback on how to make it work better for your particular workflow. For example, what is clunky, through to what would you do next? Do you want to connect this with evaluations? Would something in the UI help? or would more export functionality be better?


Office Hours & Meetup

Hamilton Meet up: Our meet-up next week. We're excited to have Sholto Armstrong talk about their use of Hamilton at Capitec - talk title & abstract below. Join/sign-up here to watch it live.

Title: Building a Decisioning Engine for Data Scientists with Hamilton.

In this talk, we will share our experience of leveraging Hamilton to build a comprehensive decisioning engine. We will explore the approach we took to build custom components on top of Hamilton, including decision trees, credit risk scorecards, and decision tables. The talk will walk you through practical examples from our work in finance, showcasing how we used our Hamilton-based decisioning engine to tackle challenges like fraud detection and credit granting. Throughout the talk, we will highlight the challenges we faced and how Hamilton was able to solve these issues. We will also explore the practical insights we gained and the best practices we developed for building robust, scalable, and explainable decisioning engines with Hamilton.

--

At the meet-up, we'll also cover some new features that we've recently released.


Hamilton Office Hours: They happen most Tuesday 9:30am PT - 10:30am PT.

Join our slack for the link.

Burr Office Hours: They happen most Wednesdays 9:30am PT - 10:3am PT.

Join our discord for the weekly link.


MLOps World & Generative AI World Summit 2024

This November is the annual MLOps & Generative AI World summit . It's in Austin, Texas. I went last year and had a great series of conversations with practitioners. If you can make it, I'd recommend attending.

For those that don't know, the goal of the summit/conference, organized by the Toronto Machine Learning Society (TMLS) , is to help companies put more machine learning and AI into production environments, effectively, responsibly, and efficiently.

Whether you're working towards a live production deployment, or currently working in production, this is a conference geared towards the gathering of like minded individuals to help share practical knowledge to help you on your journey.

Some of the talk tracks this year:

  • Real World Case Studies
  • Business & Strategy
  • Technical & Research (levels 1-7)
  • Workshops (levels 1-7) <-- I'll be doing one!
  • In-person coding sessions

GenAI for SWEs Workshop

Together with Hugo Bowne-Anderson I will be hosting a workshop for software engineers on some first principles for delivering GenAI applications. More details to follow.

I'll also be running a community table on Hamilton & Burr, and "reliable AI" best practices.

Discount for Passes

If you'd like to attend, you can use the code DAGWORKS150 to get $150 off all passes . If you're going, send me a note, I'd love to meet-up.

Conference Details

When: 9AM ET on Thursday, November 7th to 5PM ET on Friday, November 8th 2024 Where: Renaissance Austin Hotel, 9721 Arboretum Boulevard, Austin, TX. MAP.

Need more convincing? Watch this video .


New Blog post:

Following up from last week's caching released. We've added a companion blog post that explains the feature in more detail. We invite you to try the feature, feedback has been quite positive thus far, and we're looking for more feedback to continue to improve the functionality.


In the Wild:

We saw a post from Yuki Kakegawa on LinkedIn this past week. We're always excited for people coming across Hamilton -- and seeing our adoption span data engineers, machine learning engineers, through to GenAI engineers. This speaks to the universality of our tooling -- which is not an easy thing to replicate!

There were also some great comments too, like this one:


:)


要查看或添加评论,请登录

社区洞察

其他会员也浏览了