登录查看更多内容

Week 11 - Preparing for launch

Centre for AI & Climate

Connecting capabilities across technology, policy, & business to accelerate the application of AI to climate challenges.

发布日期: 2024年9月16日

Jon (Product lead):

Last week I explained the predicament we’d found ourselves in and said that the plan for the week ahead was to get ourselves out of it. Well the the short version of this update is thankfully we managed to do just that.

In fact, we start this week with a clear plan to launch a prototype we’re really happy with next week.

It’s amazing how things can come together after weeks of feeling like we were taking one step forward and two steps back. I’m always surprised at how non-linear making progress can be. I’d go as far as saying that if your rate of progress feels like a straight line, you’re probably doing it wrong.

But anyway, we’ve landed on a concept for a prototype that is very much still a prototype, but is totally aligned to the grander vision and most importantly, will enable us to test our hypothesis.

In short, we’ve taken an incredibly valuable dataset - that is currently really hard to access because of its size, format, and the fact that it’s disaggregated - and made it so much easier to work with. We’ve done this by using a cutting-edge file format and structured the data so that it can be queried with a few lines of code and downloaded in a matter of seconds.

We decided to focus on the newly released smart meter data, which is being published by UK DNOs (Distribution Network Operator). We thought this was a good place to start because access to high resolution energy consumption data is a well known challenge, and due to its nature and inaccessibility, this dataset is currently being underutilised. Our aim is meaningfully to lower the barrier to entry to energy data analysis.

The data set contains domestic smart meter consumption data at half-hourly resolution, aggregated at LV feeder level. This dataset represents 100,000 LV feeders and 2,000,000 smart meters. The coverage is expected to increase steadily over the year, until all smart meters are captured.?

There is also talk to increase resolution and provide meter level data, providing consumer privacy can be maintained. This dataset has the potential to be an absolute powerhouse, but as it increases in size and resolution, the data access issues will compound further.

We’re really excited to put this out into the open. We know it’s just the beginning and that there is a lot that needs to be improved. But we really do believe that it offers a leap in value and provides a user experience that is an order of magnitude better than what’s available today.

The plan for this week is to tie up all of the loose ends and gear up for launch. Thankful there aren’t any major risks that could delay things so it’s just a case of getting the work done now.

Juan Carlos Zambrano 1 年前

21st Century IT needs to run on 3 interconnected…

Anant Gupta 8 年前

Engineers Insight Newsletter #10

Thomas Flude 1 年前

Look out for the launch next week!

Jon

Steve (Engineering lead):

This week felt like we really got back into the swing of things. The change of direction to look at some new file formats had a bit of a learning curve, but once I got up to speed, it felt like a really good decision. After lots and lots of reading documentation, trying different libraries and measuring file sizes, I think we’ve ended up with something quite compelling.

s3://weave.energy/smart-meter.parquet is a single ~700MB geoparquet file, containing all of the data released by UK DNOs in Feb 2024. We put this data together back in July, as we were exploring the data, but we didn’t release it then, mostly because as a plain CSV, it runs to over 20GB! We’re obviously not going to stop at just February’s data, so it didn’t seem very practical or sustainable to be uploading 100’s of GBs of CSV. Even if you have the bandwidth, at that kind of file size, it’s not very easy to use the data unless you have a cluster of machines or an incredible amount of RAM.

The great thing about parquet files is that a) you don’t have to download the whole thing - you can do both predicate and projection “pushdown” to limit what you get and b) even once you do, it’s highly optimised in terms of compression and memory layout to load it straight into analysis tools like Pandas.

Parquet files have been a staple of analysis workflows for quite a while, but what we’re using here is a relatively new addition - GeoParquet - which allows us to include columns with geospatial information in them. We’re actually using the very latest version of the spec: 1.1.1 which not only allows you to include the geospatial data, but also do predicate-pushdown of bounding box queries. Sounds complex, but it basically means you can download a subset of the data by defining the geographic area you’re interested in. Even better, if you’re not interested in geospatial analysis, it’s backwards compatible, so you can open the file in regular old Pandas and just see the “geometry” column as a series of {x y} objects.

We’ve been putting together some examples of how you can access and use this data in this Jupyter notebook, as well as documentation of how we made it in this GitHub repo. We’d love any feedback you have, or to know if you find it useful. Just start a discussion on our Github repo!

Steve

Week 11 - Preparing for launch

Centre for AI & Climate

Connecting capabilities across technology, policy, & business to accelerate the application of AI to climate challenges.

领英推荐

Weeknotes

1,040 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Mastering the Art of Kubernetes Cluster Maintenance: Elevate Your Digital Harmony!??

The Extent of memorable things - Part 2: what’s little and what’s big in size

Cooperative, connected and automated mobility (CCAM). What’s going on in Turin?

Already fireworks in 2023!

New faces, growing userbase, and product enhancements!

Activating the flywheel of change

Welcome to our September Newsletter!

When to Embrace Familiarity and When to Reinvent the Wheel!

NEW Video Series: Engineer Reacts to..

领英推荐

Weeknotes

1,040 位关注者

Week 19 - Should Weave become a community-led project?

2024年11月21日

Week 18 - Can ChatGPT analyse smart meter consumption data?

2024年11月8日

Week 17 - SSEN smart meter data deep-dive

2024年10月30日

Week 16 - Building communities and pipelines

2024年10月22日

Week 15 - Post launch reflection

2024年10月15日

Week 13 - Building data pipelines

2024年9月30日

Week 12 - Product launch day

2024年9月24日

Week 10 - The Feedback Challenge

2024年9月9日

Week 9 - Building a Prototype

2024年9月2日

Week 8 - Prototyping APIs

2024年8月27日