登录查看更多内容

Side Project - Staging view: Cheaper is Better

Ignacio Alvarez

Data Engineer | Azure | Databricks

发布日期: 2025年3月17日

+ 关注

Scheduled

Here we are again, trying to scrape every possible argentinian peso.

Last week I commented on how, to move data from the "bronze" layer (a.k.a. a bunch of json files generated every 30 seconds), I had used a Durable Azure Function with a timer trigger. Once a day, it would load the ~0.14GB to a staging layer within the data lake in a more compact format. This move was convenient for two reasons:

Firstly, as the goal of this whole experiment is to move and make the data available at the lowest possible cost, this means reducing storage costs as much as possible. While it's low, there's a significant difference between having 0.14GB per day and 0.4MB per day after compaction.
Secondly, I have a forgotten Raspberry Pi, which although it's currently acting as compute for data extraction, still has capacity to run other processes. Furthermore, if I containerize both the extractor and staging applications, I could deploy them on other services if needed.

Using a Raspberry Pi directly for compute has similar considerations to using a VM. Therefore, to make use of it, I did so in conjunction with Podman and cron to schedule the tasks.

As a result, while there wasn't an improvement in execution times or a significant reduction in costs, we made better use of the compute resources we already have available and had a more fun time tinkering with the Raspberry Pi (which is always fun).

Now, with the staging layer consolidating at a rate of 0.4MB per day, we can continue modeling on the staging layer with the idea of making some use of the data, rather than just watching it increase day by day.

Github

Side Project - Staging Area

Steven Moore

Enterprise Data Solutions: Business Intelligence and Analytics | Microsoft Azure Data | Microsoft Fabric & Power BI | CCH? Tagetik | ERP and CPM

1 周

That's awesome, Ignacio. I also agree. It's fun to get hours-deep into a fun project, fix an issue, build something great, or just try something different.

查看更多评论

要查看或添加评论，请登录

Ignacio Alvarez的更多文章

Side Project - Staging Area??

2025年3月8日

Side Project - Staging Area??

Well, here we are. In my previous post, I discussed how I was ingesting data from a web service that emits public…

5 条评论
Side mini-project: Ingestion from WS

2025年2月15日

Side mini-project: Ingestion from WS

Excited to share progress on a side project involving public transportation data from my city. I recently got access to…

1 条评论
Exploring Data Quality: Insights from 'Data Quality Engineering in Financial Services' Book

2024年4月2日

Exploring Data Quality: Insights from 'Data Quality Engineering in Financial Services' Book

Recently, I've been immersed in the book "Data Quality Engineering in Financial Services" by Brian Buzzelli making my…

3 条评论
One more brick: Delta Sharing

2023年12月18日

One more brick: Delta Sharing

Sharing Data with Delta Sharing When there is a need to share data, either with an end client through visualization…

3 条评论
One more brick: Dynamic Views

2023年11月30日

One more brick: Dynamic Views

In the realm of data management, especially in environments where a consumption layer is accessible to end-users or…
One more brick: Delta Data Skipping

2023年11月20日

One more brick: Delta Data Skipping

Internally, Databricks provides the "Delta Data Skipping" functionality to enhance performance in reading tables. This…

3 条评论

See all articles

Side Project - Staging view: Cheaper is Better

Ignacio Alvarez

Data Engineer | Azure | Databricks

Ignacio Alvarez的更多文章

社区洞察

其他会员也浏览了

React vs Angular — State Management

Project Lion Cage. Technical description of the monitoring setup

Why using NeTEx EU standard for mobility data exchange?

Materialize September Newsletter

Transformations in Spark: Narrow transformations and Wide transformations

Kotlin Class Types Series – Part 2: Data Classes

Two interesting configs on AQE Join strategy conversion

My friends tremble in fear of what I've created

No-Code Solution Defects: 88%?

What if the BDM Layout is versioned ?

Ignacio Alvarez的更多文章

Side Project - Staging Area??

Side mini-project: Ingestion from WS

Exploring Data Quality: Insights from 'Data Quality Engineering in Financial Services' Book

One more brick: Delta Sharing

One more brick: Dynamic Views

One more brick: Delta Data Skipping

社区洞察

其他会员也浏览了

React vs Angular — State Management

Project Lion Cage. Technical description of the monitoring setup

Why using NeTEx EU standard for mobility data exchange?

Materialize September Newsletter

Transformations in Spark: Narrow transformations and Wide transformations

Kotlin Class Types Series – Part 2: Data Classes

Two interesting configs on AQE Join strategy conversion

My friends tremble in fear of what I've created

No-Code Solution Defects: 88%?

What if the BDM Layout is versioned ?