Art of Data Newsletter - Issue #10
Photo by Bj?rn Austmar Tórsson: https://www.pexels.com/photo/iceland-nature-space-dark-7267852/

Art of Data Newsletter - Issue #10

Welcome all Data fanatics. In today's issue:

Let's dive in!


Introducing Microsoft Fabric: Data analytics for the era of AI | 12 mins

Microsoft is introducing Microsoft Fabric, a comprehensive analytics platform that combines various data and analytics tools into one integrated solution. It incorporates technologies such as Azure Data Factory, Azure Synapse Analytics, and Power BI. The goal is to provide organizations with a unified product that enables data and business professionals to leverage their data effectively and set the groundwork for the AI era.


2023 State of Data + AI | Databricks | 20mins

The 2023 State of Data + AI report by Databricks analyzes data and AI adoption trends among over 9,000 global customers. The report aims to provide data leaders and executives with insights to understand the AI landscape and assess their own data investments and strategies. It addresses various aspects of the data estate, including the practical applications of data science and machine learning, popular data and AI products, and the execution of data warehousing in the context of the AI era.


Why Orchestration is the next hot thing in Data | 7 mins

This article highlights the limitations of Airflow as an orchestration tool and the importance of having a robust orchestration solution in the data world. The author emphasizes the significance of orchestration in ensuring data accuracy and preventing undesirable outcomes. He introduces Orchestra as a tool that enables workflows to be executed in a controlled manner, with the ability to include tests and take corrective actions when failures occur. The benefits of using an orchestration tool like Orchestra are discussed, including access to metadata, lineage tracking, and control over data assets. The author suggests that as data complexity increases and more people engage with data, a comprehensive orchestration solution becomes crucial and will soon become a prominent aspect of data management.


DoorDash identifies Five big areas for using Generative AI | 6mins

DoorDash is leveraging Generative AI, a subset of Artificial Intelligence, to enhance the customer's ordering experience on their platform. They are exploring the potential applications of Generative AI, which generates new content based on existing data, in areas such as language processing, image and video generation, and content creation. The blog post focuses on DoorDash's efforts to implement Generative AI effectively while prioritizing the privacy and security of personal information. It highlights the transformative potential of Generative AI in revolutionizing the delivery experience.


Databricks cost management at Coinbase | 10mins

Coinbase recognizes the importance of managing expenses while utilizing Databricks as a critical platform for their products. To address this, they have developed a cost management platform with various components. The first is cost attribution, achieved through cluster tagging, which allows them to track resource usage by teams and gain insights into resource allocation. They also leverage Databricks Overwatch for cost analysis by extracting valuable information from logging data, helping identify the sources of costs and proposing cost reduction strategies. Additionally, they have implemented centralized quota enforcement to proactively control cluster resource usage and manage expenses more efficiently in the long run.


What's the hype behind DuckDB? | 8mins

Matt explains DuckDB as an example of a new tool with great promise, primarily as an OLAP DBMS (Online Analytical Processing Database Management System), but also in other related applications. He refers to a post by Daniel Beech comparing DuckDB with Polars, which sparked their own experimentation with DuckDB. The author has been actively exploring the tool and sharing their findings through lightning talks.


Implementing Data Validation with Great Expectations in Hybrid Environments | 9mins

Data validation plays a crucial role in ensuring the reliability and correctness of data in processing pipelines. The article focuses on the implementation of Great Expectations (GX), an open-source framework, for data validation in a Hadoop environment. GX offers a flexible and efficient solution for data scientists and analysts to detect and rectify data issues. The article provides insights into the authors' experience with GX, highlighting both its advantages and limitations in the context of data validation.


Writing design docs for data pipelines | 7mins

The adoption of software engineering practices in the data engineering field has led to significant changes in the design and construction of data pipelines. Data engineers now utilize software engineering tools and principles, such as modular dbt models and automated data quality monitoring, to create more robust and efficient pipelines. However, there is still room for improvement, as indicated by the large number of models in dbt projects without clear design intentions. To address this issue, the article proposes the use of design docs as a tool for designing and building solid foundations for data platforms. The article explores the benefits and importance of design docs in creating intentional and scalable data pipelines.

要查看或添加评论,请登录

Bartosz Gajda的更多文章

  • Art of Data Newsletter - Issue #19

    Art of Data Newsletter - Issue #19

    Welcome all Data fanatics. In today's issue: Open challenges in #LLM research How #GenerativeAI can revolutionize Data…

  • Art of Data Newsletter - Issue #18

    Art of Data Newsletter - Issue #18

    Welcome all Data fanatics. In today's issue: Google's Bard vs OpenAI's ChatGPT Why some Data Engineers love #Rust? 4…

    1 条评论
  • Art of Data Newsletter - Issue #17

    Art of Data Newsletter - Issue #17

    Welcome all Data fanatics. In today's issue: Are #Kubernetes days numbered? The future of #Observability - 7 things to…

  • Art of Data Newsletter - Issue #16

    Art of Data Newsletter - Issue #16

    Welcome all Data fanatics. In today's issue: Real-Time #MachineLearning foundations at Lyft Most data engineers are Mid…

  • Art of Data Newsletter - Issue #15

    Art of Data Newsletter - Issue #15

    Welcome all Data fanatics. In today's issue: LinkedIn explains their new data pipeline orchestrator - Hoptimator…

  • Art of Data Newsletter - Issue #14

    Art of Data Newsletter - Issue #14

    Welcome all Data fanatics. In today's issue: Databricks announces LakehouseIQ - LLM-based Assistant for working with…

  • Art of Data Newsletter - Issue #13

    Art of Data Newsletter - Issue #13

    Welcome all Data fanatics. In today's issue: StackOverflow Survey 2023 Why consumers don't trust your Data? Data…

  • Art of Data Newsletter - Issue #12

    Art of Data Newsletter - Issue #12

    Welcome all Data fanatics. In today's issue: The rapid explosion of #AI may come to an end, due to protective licensing.

  • Art of Data Newsletter - Issue #11

    Art of Data Newsletter - Issue #11

    Welcome all Data fanatics. In today's issue: Complexities of Production AI systems Uber built Spark Analysers that…

  • Art of Data Newsletter - Issue #9

    Art of Data Newsletter - Issue #9

    Welcome all Data fanatics. In today's issue: MLOps basics for Data Engineers Managing BigQuery at Reddit scale Compass…

社区洞察

其他会员也浏览了