登录查看更多内容

Week 13 - Building data pipelines

Centre for AI & Climate

Connecting capabilities across technology, policy, & business to accelerate the application of AI to climate challenges.

发布日期: 2024年9月30日

+ 关注

Jon (Product lead):

Last week we launched our MVP !

So far the reception has been fantastic. Some high level numbers so far:

14,000 people saw the launch post on LinkedIn
950 people have visited the website
40 people downloaded and played with the data

The launch had three broad goals:

Make people aware of Weave and generate interest ?
Get people onto the Weave website ?
Convert visitors to users and get them to actually download the data ? (kind of)
Identify clear use cases and applications for the data ? (not yet)

So the plan for this week from my perspective is to find people who need the data and understand their use cases. Some great calls booked in this week off the back of inbound interest following the launch, so it's looking promising.

Jon

Data & Analytics 7 个月前

A Beginner's Guide to the Data Science Pipeline

Randy Lao ?? 7 年前

GroupBy #16: Uber's Anomaly Detection & Alerting…

Vu Trinh 10 个月前

Steve (Engineering lead):

I skipped weeknotes last week because we were focussed on the launch of Weave, but that was actually mostly Jon’s work by then. With the data together for our first prototype, I’ve started focussing on what comes next - a proper production data pipeline to handle the full set of DNO data available.

This pipeline needs to be everything our prototype wasn’t: automated, well-tested, and most importantly scalable, because it’s going to have to deal with a lot more data. No more running scripts on my laptop overnight and at weekends!?

Not being much of a data engineer before, I did a lot of research before settling on the tools that I hope will let us manage the process better, but I’m still working through them a bit. Sadly there’s nothing quite as nice as https://pangeo-forge.org/ for GeoParquet data (at least that I could find, do let us know if you know better!). Instead, we’re planning to keep things in Python for the time being and use a combination of Dagster and Dask to orchestrate and parallelise our process. If you’re an expert in either of these and want to share your hard-earned knowledge, I’d love to hear it.

The first and most important step in our pipeline is to get hold of, and archive, all the raw data. To build our prototype I just clicked around the various data portals, but we want to automate that and download new data every day. Working this out has again shown me how much friction there is in getting hold of this kind of data. Every DNO has different data portal software, which each have different API models you have to work through. Even when they use the same one, they can use it in different ways so the API requests you need are different.?

All in all, there’s likely to be several weeks of work in this kind of “plumbing” before we can expand our data’s range and quality. Imagine if every one of the 40 people who’ve downloaded it so far had to make that kind of investment!

Steve

Week 13 - Building data pipelines

Centre for AI & Climate

Connecting capabilities across technology, policy, & business to accelerate the application of AI to climate challenges.

领英推荐

Weeknotes

1,033 位关注者

Centre for AI & Climate的更多文章

社区洞察

其他会员也浏览了

Building a Custom Grid Search for Your Custom Model

The Rise of DataOps ??

Data Pioneers vs. Data Settlers (Startup Pt. 13)

Introduction to TPL Dataflow

How to Think Differently

MDS Newsletter #62

MDS Newsletter #40

Builders v/s Brewers: Differences Between a Data Engineer and a Data Scientist

Week of April 22nd

Episode #89: Leading high performing data teams and deciphering the data stack with David Jayatillake

领英推荐

Weeknotes

1,033 位关注者

Centre for AI & Climate的更多文章

Week 19 - Should Weave become a community-led project?

Week 18 - Can ChatGPT analyse smart meter consumption data?

Week 17 - SSEN smart meter data deep-dive

Week 16 - Building communities and pipelines

Week 15 - Post launch reflection

Week 12 - Product launch day

Week 11 - Preparing for launch

Week 10 - The Feedback Challenge

Week 9 - Building a Prototype

Week 8 - Prototyping APIs