登录查看更多内容

Week of October 14th

Stefan Krawczyk

CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs

发布日期: 2024年10月17日

+ 关注

TL;DR:

Announcing Shreya Shankar as an advisor.
#Hamilton release highlights: tweaks to pipe_input, new @hamilton_exclude decorator, polars + pandera bug fix
#Burr release highlights: some annotation tweaks, and user contributed docker files.
Office Hours & Meet ups for Hamilton & Burr.
MLOps World & Generative AI World Summit 2024
In the wild: Hamilton at PyConZA ; Hamilton Meet-up Recording; Burr at DataForAI meetup in SF this month

Announcing Shreya Shankar as an advisor

We're super excited that Shreya Shankar has joined us as an advisor at DAGWorks Inc. For those that don't know Shreya is a PhD at 美国加州大学伯克利分校 whose research area overlaps a lot with what we're trying to achieve with Hamilton & Burr: developing & productionizing data, ML, and AI. She runs a great blog and is someone you should follow on twitter/x.

We're excited that she's excited for what we're building! And looking forward to getting knowledge and insights from her research into what we're building with Hamilton & Burr.

Hamilton Release Highlights:

Hamilton Framework == 1.81.0

Changes

Adds on_input parameter for pipe_input, allowing you to specify which parameter to inject data into. Thanks to Jernej Frank !
Adds @hamilton_exclude decorator to decorate helper functions (replacement for _ in certain cases). Thanks to Jernej Frank !
Fix for pandera validator + polars to work together? (thanks to Jonas Meyer-Ohle for flagging)

A few docs updates/notebooks!

New notebook on data loaders - thanks Emmanuel Obeng Agyen !
Instructions to generate a PDF of Hamilton docs (for anyone that has to deal with an airgap…) — you can also download at hamilton.dagworks.io!

@pipe_input improvements + refresher

Let me walk you through a quick example. @pipe_input allows you to process input data prior to running a function. In the following case, we have:

raw_features as the first node
Two intermediate nodes applied to that _normalize_columns and _remove_outliers
Then process_features applied to the result of that!

The change is that we now have the on_input parameter to allow it to specify which function parameter the pipe_input result gets applied to. If you leave it out, it will attempt to inject into the first parameter.

from hamilton.function_modifiers import pipe_input, step
import pandas as pd

def raw_features() -> pd.DataFrame:
    return load(...)

@mutate(features)
def _normalize_columns(df: pd.DataFrame) -> pd.DataFrame:
    for column in df.columns:
        df[column] = (df[column]-df[column].min())/(df[column].max() - df[column].min())
    return df

def _remove_outliers(df: pd.DataFrame, outlier_threshold: float) -> pd.DataFrame:
    return df[df < outlier_threshold]

@pipe_input(
    step(_normalize_columns),
    step(_remove_outliers, outlier_threshold=value(10)),
    on_input="raw_features" # inject post-processing into the raw_features parameter
)
def process_features(index_subset: pd.Index, raw_features: pd.DataFrame) -> pd.DataFrame:
    return raw_features[index_subset] # raw_features is injected after being processed

Burr Release Highlights

Burr == 0.31.1

UI Fix:

Improves a few minor UI components for annotation editing/creating.

Docker Files:

As more people get to production, a common question cropping up this past week or two is around how to deploy the Burr UI. Thanks to open source, we now have two examples of how to containerize the Burr UI as well as shipping a Burr application that's run in a FastAPI web-service. Thanks to Aditya K. & Matthew Rideout for contributing here!

Office Hours & Meetup

Hamilton Meet up: Our next meet-up will be December. Want to present? Reach out. Otherwise join/sign-up here.

Hamilton Office Hours: They happen most Tuesday 9:30am PT - 10:30am PT.

Join our slack for the link.

Burr Office Hours: They happen most Wednesdays 9:30am PT - 10:3am PT.

Join our discord for the weekly link.

MLOps World & Generative AI World Summit 2024

This November is the annual MLOps & Generative AI World summit. It's in Austin, Texas. I went last year and had a great series of conversations with practitioners. If you can make it, I'd recommend attending.

For those that don't know, the goal of the summit/conference, organized by the Toronto Machine Learning Society (TMLS) , is to help companies put more machine learning and AI into production environments, effectively, responsibly, and efficiently.

Whether you're working towards a live production deployment, or currently working in production, this is a conference geared towards the gathering of like minded individuals to help share practical knowledge to help you on your journey.

Some of the talk tracks this year:

Real World Case Studies
Business & Strategy
Technical & Research (levels 1-7)
Workshops (levels 1-7) <-- I'll be doing one!
In-person coding sessions

GenAI for SWEs Workshop

Together with Hugo Bowne-Anderson I will be hosting a workshop for software engineers on some first principles for delivering GenAI applications. More details to follow.

I'll also be running a community table on Hamilton & Burr, and "reliable AI" best practices.

Discount for Passes

If you'd like to attend, you can use the code DAGWORKS150 to get $150 off all passes. If you're going, send me a note, I'd love to meet-up.

Conference Details

When: 9AM ET on Thursday, November 7th to 5PM ET on Friday, November 8th 2024 Where: Renaissance Austin Hotel, 9721 Arboretum Boulevard, Austin, TX. MAP.

Need more convincing? Watch this video.

In the Wild:

Hamilton at PyConZA

Sholto Armstrong presented on their additions on top of Hamilton at PyConZA . This to our knowledge is the first user presentation on Hamilton at a conference that wasn't given by myself or Elijah ben Izzy ! ??

Hamilton Meet-up Recording

We had the meet-up this past week. Excellent engagement and a lot of questions and discussion around the main presentation, as well as the new features that shipped -- in particular caching! More details in the notes section.

Burr at DataForAI meetup in SF this month

I'm excited to present Burr at this month's DataForAI meetup. If you're in SF you should swing by! More details and link to the meet-up in Lindsey Robertson post below:

Patrick Damaso, MD

AI Pipelines for Healthcare Utilization Management

1 个月

Burr with an LLM Avenger! Amazing!

1 次回应

Hugo Bowne-Anderson

Data and AI scientist, consultant. writer, educator, machine learner, podcaster.

shreya is a boss-level move, dude ??

4 次回应

David Scharbach

Qǐyè jiā , Gùwèn

Re: Shreya that's awesome - congrats!

2 次回应

查看更多评论

Week of October 14th

Stefan Krawczyk

CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs

Announcing Shreya Shankar as an advisor

Hamilton Release Highlights:

Hamilton Framework == 1.81.0

Burr Release Highlights

Burr == 0.31.1

Office Hours & Meetup

MLOps World & Generative AI World Summit 2024

GenAI for SWEs Workshop

Discount for Passes

Conference Details

In the Wild:

Hamilton at PyConZA

Hamilton Meet-up Recording

Burr at DataForAI meetup in SF this month

Stefan's Weekly Updates

725 位关注者

更多精彩文章

Announcing Shreya Shankar as an advisor

Hamilton Release Highlights:

Hamilton Framework == 1.81.0

Burr Release Highlights

Burr == 0.31.1

Office Hours & Meetup

MLOps World & Generative AI World Summit 2024

GenAI for SWEs Workshop

Discount for Passes

Conference Details

In the Wild:

Hamilton at PyConZA

Hamilton Meet-up Recording

Burr at DataForAI meetup in SF this month

Stefan's Weekly Updates

725 位关注者

Week of November 18th

2024年11月22日

Week of November 11th

2024年11月15日

Week of November 4th

2024年11月8日

Week of October 28th

2024年10月31日

Week of October 21st

2024年10月24日

Week of October 7th

2024年10月11日

September 30th

2024年10月3日

Week of September 23rd

2024年9月27日

Week of September 16th

2024年9月19日

Week of September 9th

2024年9月12日