登录查看更多内容

Week of April 22nd

Stefan Krawczyk

CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs

发布日期: 2024年4月25日

+ 关注

Here's the TL;DR:

class>Catch my thoughts on "MLOps vs. Eng: Misaligned Incentives and Failure to Launch?" class> as part of hirubhai.net/company/heavybit?trk=article-ssr-frontend-pulse_little-mention" target="_blank" data-tracking-control-name="article-ssr-frontend-pulse_little-mention" data-tracking-will-navigate data-test-link>Heavybit class> 's blog on the topic.

class>View the recording of i.net/in/elijahbenizzy?trk=article-ssr-frontend-pulse_little-mention" target="_blank" data-tracking-control-name="article-ssr-frontend-pulse_little-mention" data-tracking-will-navigate data-test-link> class> 's hirubhai.net/company/datacouncil-ai?trk=article-ssr-frontend-pulse_little-mention" target="_blank" data-tracking-control-name="article-ssr-frontend-pulse_little-mention" data-tracking-will-navigate data-test-link>Data Council class> Austin Talk "

Move Fast and Dont Break Things -- How to Build a Data Platform that Scales with your Organization class>". Hint: it talks about Hamilton ;) class>I am running a free lightning session on hirubhai.net/company/maven-hq?trk=article-ssr-frontend-pulse_little-mention" target="_blank" data-tracking-control-name="article-ssr-frontend-pulse_little-mention" data-tracking-will-navigate data-test-link>Maven class> titled

"Build a Document Processing Pipeline for RAG Systems" - sign up class>! class>Hamilton OS meet-up recording class>Hamilton release 1.59.0. More OS community contributed features: a Data Loader, data-tracking-control-name="article-ssr-frontend-pulse_little-mention" data-tracking-will-navigate data-test-link>dltHub class> plugin, PyArrowTableResult, & a new decorator. class>

New Hamilton blog & example on using Hamilton for ad-hoc analyses class>. Thanks to

People Data Labs class> for giving us some data to play with here. class>

New Burr blog on building interactive agents class>; we show how to build an email assistant application.

Have a great week - dive in below for more details!

Heavybit Blog

The nice folks at Heavybit interviewed me for my thoughts on why MLOps is hard and why companies struggle with it and then wrote a blog about it. I'm not the only one interviewed, so there's some varying perspectives; it's a good read.

Read the write-up here .

Data Council Recording

Elijah ben Izzy , the other co-creator of Hamilton & Burr, gave a talk on what platform initiatives should be doing, and showcased that with what we've been building with Hamilton. It's a great viewing for anyone doing MLOps/LLMOps or thinking of centralization & standardization i.e. "building a platform".

YouTube Recording Here

Maven Lightning Session

I'll be doing a 30 minute free session titled "Build a Document Processing Pipeline for RAG Systems ".

What I'll cover:

The components that you need to have [document loading, parsing, text chunking, and embedding creation]
We'll write some code [Hamilton + LangChain]
Highlight Caveats for going from development to production

Why am I doing this?

Retrieval Augmented Generation or RAG is a ?? hot topic. But to use RAG you need to have data to retrieve. Most commonly in organizations this data is in some form of document. Understanding the "what" and "how" of creating a document processing pipeline will enable you to move faster and make better decisions as you build out your RAG system.

Hamilton Meet-up Recording

Last week we had our Hamilton meet-up. In it we covered:

Thierry Jean 's experience building ML models and therefore what motivated creation of the Experiment Manager.
We did a deep-dive on the ways you can load & save data with Hamilton and the abstractions available to help centralize and standardize how this is done.
Watch the recording

1.59.0 Hamilton Release

?? New Features:

Do you use @resolve much? If so there’s a new decorator @resolve_from_config #828 . Thanks to Jan Hurst for the addition! This decorator is a small wrapper around the existing one to make it less verbose, but also clear as to where things are coming from.

DJ Patil 1 年前

Clearing the Path to Data-Driven Decisions

Leon Gordon 8 个月前

DataOps Can Bring Certainty to Uncertain Data

Peterson Technology Partners 1 年前

@resolve_from_config(
    decorate_with=lambda columns_to_sum_map: parameterize(
        **{
            key: {"col_1": source(value[0]), "col_2": source(value[1])}
            for key, value in columns_to_sum_map.items()
        }
    ),
)
def generic_summation(col_1: pd.Series, col_2: pd.Series) -> pd.Series:
   ...

This is Jan Hurst ’s first contribution to Hamilton! ??

There is now a dltHub data saver & data loader (i.e. materializer) by in #820 . To see how to use it, check out this notebook .

New #Pandas table DataLoader by Swapnil Dewalkar in #804 ! Thank you!
We now have an implementation of outputting a PyArrowTable as an output of Hamilton by Thierry Jean in #830 .

To use it:

from hamilton.plugins import h_pyarrow

result_builder = h_pyarrow.PyarrowTableResult()
# pass to Builder().adapters(), or to a DataSaver (i.e. materializer)

With the inbuilt visualization we now display configuration values. Thanks to Thierry Jean in #833 . Before: visualization showed configuration, but not the value for it. Now: you can see the config value associated with a particular image rendering - see below:

Knock now shows the value that was used.

?? Documentation / Examples:

examples/pdl Notebook introduction to Hamilton + People Data Labs by Thierry Jean in #817 - see code here (goes with blog post below)

We cleaned a few things up in the “user guide” section by Thierry Jean in #826

Hamilton Blog

New blog post & tutorial this week from our blog courtesy of Thierry Jean !

We cover how to use #Hamilton for ad-hoc analyses in a notebook and how it's not a big change to your workflow. The end result is that it helps you structure your analyses easily, that also coincidentally enables you to easily reuse or extend, or even productionize your work!

Thanks to People Data Labs for the data that we used in this post to make it more realistic -- you can download the data and play with it too!

Links:

Blog
Code

Burr Blog

The email application we showcase in the blog

We also published a Burr blog this past week . In it we describe how to build an interactive agent with Burr. We believe most agent workflows should be designed to have humans-in-the-loop. This is what we're designing Burr for and why we think it's different -- it should be easy to build an agent application and inject human oversight into it.

In the blog we use the example of building a simple Email Assistant agent that can help you write a response to an email. The blog describes how to build the application in #Burr, and also run it on FastAPI . We don't dive into the details of it, but there's also an example UI that one can use to play around with it.

> pip install "burr[start]"
> burr # to start the burr server
# navigate to demos and use

Thanks that's all for this week!

Week of April 22nd

Stefan Krawczyk

CEO @ DAGWorks Inc. | Co-creator of Hamilton & Burr | Pipelines & Agents: Data, Data Science, Machine Learning, & LLMs

领英推荐

Stefan's Weekly Updates

722 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Why 2022 Will Be the Year of Data Observability

DAX variables, virtual relationships and iterators!

The Rise of DataOps ??

Dataverse Episode 1: Weaving the Data Mesh

Introduction to TPL Dataflow

How to Think Differently

7 Habits of Effective CDAO - Masters of the External Environment (Weak Signals and Strong Signals)

MDS Newsletter #48

What is “The Art of Thinking Like a Data Scientist” Workbook and Why It Matters

Accelerating Value & Innovation with Process Tempo, Snowflake, and Neo4j

领英推荐

Stefan's Weekly Updates

722 位关注者

Week of November 18th

2024年11月22日

Week of November 11th

2024年11月15日

Week of November 4th

2024年11月8日

Week of October 28th

2024年10月31日

Week of October 21st

2024年10月24日

Week of October 14th

2024年10月17日

Week of October 7th

2024年10月11日

September 30th

2024年10月3日

Week of September 23rd

2024年9月27日

Week of September 16th

2024年9月19日