DATA Pill #003: Apache Airflow at Scale, One-stop MLOps portal and more

DATA Pill #003: Apache Airflow at Scale, One-stop MLOps portal and more

Hi everyone ??

?let’s start the third leg of our DATA marathon.

?

ARTICLES?

Lessons Learned From Running Apache Airflow at Scale | 10 min read | Apache Airflow | Megan Parker | Shopify Blog?

Challenges in running Airflow at scale + concrete solutions

  • A combination of GCS and NFS allows for both performant and easy to use file management.
  • Metadata retention policies can reduce degradation of Airflow performance.
  • A centralized metadata repository can be used to track DAG origins and ownership.


One-stop MLOps portal at LinkedIn | 10 min read | MLOps| LinkedIn Blog

To visualize the entire ML lifecycle, an infrastructure is needed to automatically track every step of the machine learning process. We created a data schema to capture the complete, structured, and well-documented information detailing how machine learning models are produced.


Monitoring Large-Scale Apache Flink Applications, Part 1: Concepts & Continuous Monitoring | 12 min read | Apache Flink | Nico Kruber | Ververica Blog?

This post introducees various useful metrics which can be set up with proper alerts to inform you about imminent failures and allow you to monitor cluster and application health and checkpointing progress. Different ways to track latency and observe your application’s throughput for performance monitoring


Real-time ingestion to Iceberg with Kafka Connect - Apache Iceberg Sink | 11 min read | Apache Iceberg Sink | ?? Grzegorz Liter | GetInData Blog?

GetInData created an Apache Iceberg sink that can be deployed on a Kafka Connect instance. Data format that is consumed by Apache Iceberg has to represent table-like data and its schema, therefore we used a format created by Debezium for change data capture.

{ MORE LINKS }


____________________

PODCAST

Dataflow Automation | 47 min | The Data Exchange

Jeremiah Lowin CEO of Prefect on designing tools to allow teams to build, run, and monitor data pipelines at scale. Data engineering challenges facing data and ML teams today, and implications of looming trends in machine learning and AI are discussed.?

{ MORE LINKS }


____________________

DATAtube?

Things I Wish I Knew When I Started As A Data Engineer ?| 15 min | Seattle Data Guy

Lessons and advice after 10 years in data. Don't try to learn all technologies all at once - it’s gonna get you nowhere

{ MORE LINKS }


If You have any feedback, please leave a comment below.?

I want this newsletter to reach out to our tech community and its needs.


See You tomorrow!

Adam Kawa from GetInData

要查看或添加评论,请登录

Adam Kawa的更多文章

社区洞察

其他会员也浏览了