登录查看更多内容

Week of June 9th

Stefan Krawczyk

CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs

发布日期: 2024年6月14日

+ 关注

TL;DR:

Hamilton Release highlights: UI is now a pip installable package; there's a tracker to auto populate to MLflow
Hamilton is being used in another open source project - Wren AI .
Blog: Lean Data Automation: A Principal Components Approach
Blog: Traveling back in time with Burr
Burr in Python Weekly
Example: Using Burr behind an OpenAI compatible endpoint.
Hamilton OS Meetup Group: June meed up next week. We'll cover new functionality with Kedro , MLflow , and then how to use Hamilton for document ingestion in RAG.

Hamilton Release 1.66.0

Highlights:

Hot on the heels of Databricks data & AI summit, we're pleased to announce some great support for MLflow .

MLFlow: MLflow is a project that is popular for storing things like model metrics and even model artifacts. With this new release you have:

access to data savers & loaders that will log to and load from MLFlow.
a tracker that will auto populate data for an MLFlow run given a Hamilton DAG run.

What this means is that you don’t have to include / integrate MLFlow directly into your Hamilton code, instead you couple it together at the Driver level. This enables a clean separation between logic and runtime concerns.

from hamilton import driver
from hamilton.io.materialization import to
from hamilton.plugins.h_mlflow import MLFlowTracker

dr = (
    driver.Builder()
    .with_modules(model_training_2)
    .with_adapters(MLFlowTracker()) # <- add this, it'll autolog to MLFlow
    .with_materializers(
        to.mlflow(
            id="trained_model__mlflow",
            dependencies=["trained_model"],
            register_as="my_new_model",
        ),
    )
    .build()
)

For more details see this video overview and this tutorial notebook.

Hamilton UI Update:

Before you needed to have Docker installed to run the UI. Now you don’t!

pip install "sf-hamilton[ui]"

then

hamilton ui

to start it.

This should enable you to quickly and easily explore your Hamilton DAGs — just add the adapter to your driver (follow the instructions in the UI) and then it’ll log to it; you don’t need to execute it to be able to see it in the UI.

Fixes:

SDK: now how better guards around JSON serializable inputs.
Hamilton: fix for parallelizable . Thanks to Volker Lorrmann for raising.
Hamilton: Inputs can now be outputs, without them being defined in the DAG. Thanks to DS team at RTV EURO AGD for raising. E.g. this is useful if you want to pass in extra columns that you want to add to the output in the case of creating a pandas dataframe for example.

Examples / Documentation Updates:

MLFlow tracker & data saver/loader example
We’ve added links to running notebooks in google colab where it makes sense.

WrenAI

We're excited that Hamilton is being picked up by another open source library. This time from Wren AI . They are building a RAG system and using Hamilton to help orchestrate it!

Blog: Lean Data Automation: A Principal Components Approach

领英推荐

AirFlow 3 is coming, forecasting with the fable…

Rami Krispin 6 个月前

Data Science Road Map 2022 – The Ultimate Guide

Abhinavan Sarikonda ? 2 年前

Should you send a payload in an HTTP GET request?

Arpit Bhayani 2 年前

This blog post was written in collaboration with Runhouse . In it we discuss that by unbundling the principle components of (macro) orchestrators, we can take advantage of a lean, cost-effective, and flexible stack. This can be done in such a way, i.e. by choosing the right tools, to preserve all the visibility, collaboration, and scale we need.

In the post we have a code example of this stack. It uses Github Actions, which is a free and widely available scheduler, and then combines using Hamilton with Runhouse, i.e. two open-source dedicated asset and infrastructure layers, to create this nimble and lean approach.

My main take away is that you can get pretty far before you have to reach for something like Airflow, Dagster, or Prefect, for data & ML work.

Blog: Traveling back in time with Burr

One of the best features of Burr is the ability to "fork state", i.e. given some application run and a point in time, copy that state into another application for you to debug/iterate with.

We wrote the hows and whys of this up in a post. That also comes along with a user contributed video on how they use this approach to develop with Burr! Thanks Ashis Ghosh !

The TL;DR: to enable it, is that you just need to pass in the write "IDs" to know where to take state from when building your application:

.initialize_from(
    state_persister,
    resume_at_next_action=True,
    default_state={"count" : 0},
    default_entrypoint="count",
    fork_from_app_id=PARENT_APP_ID,                                # <--
    fork_from_sequence_id=PARENT_APP_SEQUENCE_ID # <--
)

Burr in Python Weekly

Burr made it onto the Python Weekly Newsletter - https://mailchi.mp/pythonweekly/python-weekly-issue-654

Always fun to see our projects get picked up onto various lists.

New Burr example: using it to power an OpenAI compatible endpoint.

Thierry Jean came up with a cute idea. There are many UIs that allow you to interface with an OpenAI compatible endpoint easily. Wouldn't it be nice to use one of them to interact with your Burr application?

Well we now have an example that precisely shows this.

The idea is simple - expose a FastAPI endpoint that mirrors the OpenAI endpoint. Then underneath, it delegates logic to Burr, which in turn can do whatever you want!

Here's a video walkthrough of it.

Hamilton OS Meetup Group

June meet-up is this coming week! Sign up here

New functionality:

Kedro Adapter
MLFlow Tracker
Locally running the Hamilton UI
The deep dive, will be an introduction on “How to use Hamilton in a RAG context”, e.g. for document ingestion.

Stefan's Update

894 位关注者

Elijah ben Izzy

Co-creator of Hamilton/Burr OS libraries, Co-founder @ DAGWorks (YC W23, StartX S23)

9 个月

Really can't believe we got all this done this week

2 次回应

要查看或添加评论，请登录

Stefan Krawczyk的更多文章

February Updates

2025年2月20日

February Updates

TL;DR: #Hamilton highlights: crossed 2000 github stars, released multithreading based DAG parallelism, RichProgressBar…

3 条评论
Last week of 2024 / first week of 2025

2025年1月2日

Last week of 2024 / first week of 2025

TL;DR: #Hamilton + #Burr 2024 stats: 35M+ telemetry events (10x), 100K+ unique IPs (10x) from 1000+ companies, 1M+…

3 条评论
Week of December 9th

2024年12月13日

Week of December 9th

TL;DR: #Hamilton release highlights: Better TypedDict support and modular subdag example Office Hours & Meet ups for…
Week of December 2nd

2024年12月5日

Week of December 2nd

TL;DR: #Hamilton release highlights: Async Datadog Integration, Polars & Pandas with_columns support. #Burr release…
Week of November 18th

2024年11月22日

Week of November 18th

TL;DR: #Hamilton release highlights: SDK configurability #Burr release highlights: parallelism UI modifications, video…
Week of November 11th

2024年11月15日

Week of November 11th

TL;DR: #Hamilton release highlights: async support for @pipe + various small fixes #Burr release highlights:…
Week of November 4th

2024年11月8日

Week of November 4th

TL;DR: #Hamilton release highlights: @with_columns decorator for Pandas by Jernej Frank & module overrides for async…
Week of October 28th

2024年10月31日

Week of October 28th

TL;DR: #Hamilton release highlights: in-memory cache store. #Burr release highlights: release candidate for a first…
Week of October 21st

2024年10月24日

Week of October 21st

TL;DR: #Hamilton release highlights: some minor fixes and docs updates from five different OS contributors! Also…
Week of October 14th

2024年10月17日

Week of October 14th

TL;DR: Announcing Shreya Shankar as an advisor. #Hamilton release highlights: tweaks to pipe_input, new…

3 条评论

See all articles

Week of June 9th

Stefan Krawczyk

CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs

TL;DR:

Hamilton Release 1.66.0

Highlights:

Examples / Documentation Updates:

WrenAI

Blog: Lean Data Automation: A Principal Components Approach

领英推荐

Blog: Traveling back in time with Burr

Burr in Python Weekly

New Burr example: using it to power an OpenAI compatible endpoint.

Hamilton OS Meetup Group

Stefan's Update

894 位关注者

Stefan Krawczyk的更多文章

社区洞察

其他会员也浏览了

Introducing Zyte API Enterprise – Technology + Expertise to supercharge your in-house data extraction team

Distributed Bloom Filter

Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018 -Trends and Analysis

Top 10 Tools or Applications or Libraries or Packages Used by Data Scientists in Day-to-Day Work and their mapping to Data Science Life Cycle in IT

In Defense of the Humble .ipynb

Introducing: MGraph-AI - A Memory-First Graph Database for GenAI and Serverless Apps

Polars Vs Pandas: Benchmarking performances and beyond

The Power of Ten

Algorithms & Data Structures— A beginners guide ?? ??

?? Big Data in Construction. Part 1-2: First Dataset. Tika OCR. Extracting content and metadata.

TL;DR:

Hamilton Release 1.66.0

Highlights:

Examples / Documentation Updates:

WrenAI

Blog: Lean Data Automation: A Principal Components Approach

领英推荐

Blog: Traveling back in time with Burr

Burr in Python Weekly

New Burr example: using it to power an OpenAI compatible endpoint.

Hamilton OS Meetup Group

Stefan's Update

894 位关注者

Stefan Krawczyk的更多文章

February Updates

Last week of 2024 / first week of 2025

Week of December 9th

Week of December 2nd

Week of November 18th

Week of November 11th

Week of November 4th

Week of October 28th

Week of October 21st

Week of October 14th

社区洞察

其他会员也浏览了

Introducing Zyte API Enterprise – Technology + Expertise to supercharge your in-house data extraction team

Distributed Bloom Filter

Python eats away at R: Top Software for Analytics, Data Science, Machine Learning in 2018 -Trends and Analysis

Top 10 Tools or Applications or Libraries or Packages Used by Data Scientists in Day-to-Day Work and their mapping to Data Science Life Cycle in IT

In Defense of the Humble .ipynb

Introducing: MGraph-AI - A Memory-First Graph Database for GenAI and Serverless Apps

Polars Vs Pandas: Benchmarking performances and beyond

The Power of Ten

Algorithms & Data Structures— A beginners guide ?? ??

?? Big Data in Construction. Part 1-2: First Dataset. Tika OCR. Extracting content and metadata.