Week of October 14th
Image by author.

Week of October 14th

TL;DR:

  • Announcing Shreya Shankar as an advisor.
  • #Hamilton release highlights: tweaks to pipe_input, new @hamilton_exclude decorator, polars + pandera bug fix
  • #Burr release highlights: some annotation tweaks, and user contributed docker files.
  • Office Hours & Meet ups for Hamilton & Burr.
  • MLOps World & Generative AI World Summit 2024
  • In the wild: Hamilton at PyConZA ; Hamilton Meet-up Recording; Burr at DataForAI meetup in SF this month


Announcing Shreya Shankar as an advisor


Shreya

We're super excited that Shreya Shankar has joined us as an advisor at DAGWorks Inc. For those that don't know Shreya is a PhD at 美国加州大学伯克利分校 whose research area overlaps a lot with what we're trying to achieve with Hamilton & Burr: developing & productionizing data, ML, and AI. She runs a great blog and is someone you should follow on twitter/x.

We're excited that she's excited for what we're building! And looking forward to getting knowledge and insights from her research into what we're building with Hamilton & Burr.


Hamilton Release Highlights:

Hamilton Framework == 1.81.0

Changes

  • Adds on_input parameter for pipe_input, allowing you to specify which parameter to inject data into. Thanks to Jernej Frank !
  • Adds @hamilton_exclude decorator to decorate helper functions (replacement for _ in certain cases). Thanks to Jernej Frank !
  • Fix for pandera validator + polars to work together? (thanks to Jonas Meyer-Ohle for flagging)

A few docs updates/notebooks!

@pipe_input improvements + refresher

Let me walk you through a quick example. @pipe_input allows you to process input data prior to running a function. In the following case, we have:

  • raw_features as the first node
  • Two intermediate nodes applied to that _normalize_columns and _remove_outliers
  • Then process_features applied to the result of that!

The change is that we now have the on_input parameter to allow it to specify which function parameter the pipe_input result gets applied to. If you leave it out, it will attempt to inject into the first parameter.

from hamilton.function_modifiers import pipe_input, step
import pandas as pd

def raw_features() -> pd.DataFrame:
    return load(...)

@mutate(features)
def _normalize_columns(df: pd.DataFrame) -> pd.DataFrame:
    for column in df.columns:
        df[column] = (df[column]-df[column].min())/(df[column].max() - df[column].min())
    return df

def _remove_outliers(df: pd.DataFrame, outlier_threshold: float) -> pd.DataFrame:
    return df[df < outlier_threshold]

@pipe_input(
    step(_normalize_columns),
    step(_remove_outliers, outlier_threshold=value(10)),
    on_input="raw_features" # inject post-processing into the raw_features parameter
)
def process_features(index_subset: pd.Index, raw_features: pd.DataFrame) -> pd.DataFrame:
    return raw_features[index_subset] # raw_features is injected after being processed        



Burr Release Highlights

Burr == 0.31.1

UI Fix:

  • Improves a few minor UI components for annotation editing/creating.

Docker Files:

  • As more people get to production, a common question cropping up this past week or two is around how to deploy the Burr UI. Thanks to open source, we now have two examples of how to containerize the Burr UI as well as shipping a Burr application that's run in a FastAPI web-service. Thanks to Aditya K. & Matthew Rideout for contributing here!


Office Hours & Meetup

Hamilton Meet up: Our next meet-up will be December. Want to present? Reach out. Otherwise join/sign-up here.

Hamilton Office Hours: They happen most Tuesday 9:30am PT - 10:30am PT.

Join our slack for the link.

Burr Office Hours: They happen most Wednesdays 9:30am PT - 10:3am PT.

Join our discord for the weekly link.


MLOps World & Generative AI World Summit 2024

This November is the annual MLOps & Generative AI World summit. It's in Austin, Texas. I went last year and had a great series of conversations with practitioners. If you can make it, I'd recommend attending.

For those that don't know, the goal of the summit/conference, organized by the Toronto Machine Learning Society (TMLS) , is to help companies put more machine learning and AI into production environments, effectively, responsibly, and efficiently.

Whether you're working towards a live production deployment, or currently working in production, this is a conference geared towards the gathering of like minded individuals to help share practical knowledge to help you on your journey.

Some of the talk tracks this year:

  • Real World Case Studies
  • Business & Strategy
  • Technical & Research (levels 1-7)
  • Workshops (levels 1-7) <-- I'll be doing one!
  • In-person coding sessions

GenAI for SWEs Workshop

Together with Hugo Bowne-Anderson I will be hosting a workshop for software engineers on some first principles for delivering GenAI applications. More details to follow.

I'll also be running a community table on Hamilton & Burr, and "reliable AI" best practices.

Discount for Passes

If you'd like to attend, you can use the code DAGWORKS150 to get $150 off all passes. If you're going, send me a note, I'd love to meet-up.

Conference Details

When: 9AM ET on Thursday, November 7th to 5PM ET on Friday, November 8th 2024 Where: Renaissance Austin Hotel, 9721 Arboretum Boulevard, Austin, TX. MAP.

Need more convincing? Watch this video.


In the Wild:

Hamilton at PyConZA

Sholto Armstrong presented on their additions on top of Hamilton at PyConZA . This to our knowledge is the first user presentation on Hamilton at a conference that wasn't given by myself or Elijah ben Izzy ! ??

Hamilton Meet-up Recording

We had the meet-up this past week. Excellent engagement and a lot of questions and discussion around the main presentation, as well as the new features that shipped -- in particular caching! More details in the notes section.

Burr at DataForAI meetup in SF this month

I'm excited to present Burr at this month's DataForAI meetup. If you're in SF you should swing by! More details and link to the meet-up in Lindsey Robertson post below:


Patrick Damaso, MD

AI Pipelines for Healthcare Utilization Management

1 个月

Burr with an LLM Avenger! Amazing!

Hugo Bowne-Anderson

Data and AI scientist, consultant. writer, educator, machine learner, podcaster.

1 个月

shreya is a boss-level move, dude ??

David Scharbach

Qǐyè jiā , Gùwèn

1 个月

Re: Shreya that's awesome - congrats!

要查看或添加评论,请登录