Art of Data Newsletter - Issue #9
Welcome all Data fanatics. In today's issue:
Let's dive in!
MLOps Basics - For Data Engineers | 12mins
MLOps (Machine Learning Operations) is the term used to describe the work that Data Engineers take on to enable ML to run at scale in a production environment. It involves the automation of machine learning tasks such as feature storage, model training, prediction, and analysis. Feature stores are used to represent data in a form that algorithms can understand and process, while MLOps automation and tracking are important to run a stable ML environment. This article describes the best practices, including data tracking, and automating machine learning tasks
Wrangling BigQuery at Reddit | 31mins
This reddit post talks about managing a BigQuery instance
Enterprise Data Platform @ Compass | 10mins
Compass chose Databricks to build its modern data platform due its scalability, reliability, security, and ability to support AI, BI, and DI use cases on one platform. This platform has allowed the company to store and manage its analytics data on one platform, create an environment for AI, BI, and DI collaboration, and optimize its cost metrics. The platform has become a comprehensive go-to place for data and machine learning needs across the company and is currently undergoing an evolution to reach its full potential.
领英推荐
This article explains using MLOps best practices for rapid and reliable Machine Learning Experiments
Thanan explains how he implemented an observability monitoring system for thousands of data pipelines running on Apache Spark. The Spark Listener collected statistics of each event and exported useful statistics to the DTP Internal Server via REST API. SLOs were set for runtime, skew, spill and failed apps, with tier levels for priority. Example results after monitoring included a skew issue that was solved with one line of code, and a retry issue that was fixed by adjusting resources and the repartitioning
Instacart has implemented a new ads measurement platform using modularized ETL pipelines
This article from Microsoft, advocates for using graphs to model and analyze the customer journey. It outlines a comprehensive approach, from conceptualizing the customer journey, gathering data, building a graph model, and further analysis. This approach provides a comprehensive and visual method of understanding the customer journey that can guide future developmental decisions.