登录查看更多内容

Art of Data Newsletter - Issue #18

Bartosz Gajda

Databricks - Azure - Python | Staff Azure Data Engineer @ Lingaro

发布日期: 2023年8月7日

+ 关注

Welcome all Data fanatics. In today's issue:

谷歌 's Bard vs OpenAI 's ChatGPT
Why some Data Engineers love #Rust?
4 ways to shoot yourself in the foot with #Redis
10 things to learn from Joe Reis 's Fundamentals of Data Engineering book
How to measure the value of internal tools
#ApacheSpark vs #ApacheFlink for streaming use cases

Let's dive in!

The LLM Battle Begins: Google Bard vs ChatGPT (Ep. 231) - Data Science at Home Podcast | 25mins

Google Bard is a groundbreaking AI model that is expected to rival ChatGPT and other conversational AI systems. Bard's distinctive features, advanced structure, and proficiency in producing human-like responses are notable. The model is causing a stir within the AI community and this podcast compares these two and highlights its weaknesses.

Why (some data people) Love Rust? - by Daniel Beach | 11mins

Daniel discusses his journey of falling in love with the programming language, Rust, and its benefits in the world of data engineering. The author credits Rust's appeal to its speed, memory model, immutability, and static typing. Cargo, Rust's manager for packaging and dependency, is another feature that the author commends for its simplicity and effectiveness, especially after experiencing some issues with Python's packaging system. However, the author also mentions that Rust has a steep learning curve and may not be the go-to programming language for everyday data engineering tasks, as Python still dominates this field. Despite this, learning and using Rust is deemed beneficial as it can enhance one's skills as an engineer.

Four ways to shoot yourself in the foot with Redis | 9mins

The article discusses common errors made while maintaining production outages, specifically related to using Redis. It starts by explaining how concurrency in the application layer can create contention as commands are queued on the server due to Redis' single-threaded nature. A solution suggested is sharding data across multiple Redis instances or using Redis Cluster for a more general approach. The use of Lua scripts or functions for logic that must run atomically is discussed as a potential source of errors due to Redis' single-threaded nature. It also advises on having alerts set up for memory usage at varying levels to prevent failure. Other tips include understanding the subtleties of Redis' API, careful serialisation of objects to JSON strings before storage, and smart use of lists for large collections.

领英推荐

Data Roles, Small Language Models, Knowledge Graphs…

Towards Data Science 1 个月前

Data Science and AI Trends 2021 Rundown

Michael Spencer 3 年前

Building Reliable RAG-Based LLM Applications: Key Pain…

Data Science Dojo 11 个月前

10 Things I Learned from Reading Fundamentals of Data Engineering | 17mins

The book "Fundamentals of Data Engineering" by Joe Reis and Matt Housley is highly recommended for data engineers to gain a deep understanding of the important areas of data engineering. These include the concepts of data generation, ingestion, orchestration, transformation, storage, and governance.

Data engineering is described as the design, implementation, and maintenance of systems and processes that transform raw data into high-quality, consistent information and this intersects with security, data management, DataOps, data architecture, orchestration, and software engineering.

The book also covers the intricacies of data engineering lifecycle such as source systems, the choice of storage, the data ingestion process, the transformation stage, and the effective usage of data.

How To Measure the Value of Internal Tools | Square Corner Blog | 5mins

Square utilizes several internal tools to boost productivity and efficiency, including a communications platform, a customer data platform, and data platform by Amplitude. These tools are crucial to providing a superior experience for their external customers. Square monitors the value and effectiveness of these tools using specific data and metrics. They categorize their tools based on the metrics required to track each one and utilise product and operational metrics.

Product metrics gauge product usage/impact and include user adoption, engagement, satisfaction, and business impact. Operational metrics evaluate the performance and reliability of the product/service from the provider’s perspective including internal impact, service level, and reliability.

A side-by-side comparison of Apache Spark and Apache Flink for common streaming use cases | AWS Big Data Blog | 14mins

Apache Flink and Apache Spark are both widely used for big data processing and analytics. Spark is lauded for its simplicity, high-level APIs, and the capacity to process large amounts of data, while Flink excels at real-time, low-latency data stream processing and stateful computations. The post compares their usage for common streaming patterns, their APIs, and their techniques for handling data preparation, processing, and enrichment. It concludes that both systems are evolving quickly and are effective at handling big data, recommending that the choice between the two should depend on the specific needs of the workload and the surrounding architecture. Advice is also given on using user-defined functions judiciously to avoid slowing down the job or causing backpressure.

Art of Data

284 位关注者

Przemys?aw Goska

1 年

The Bard podcast was notably interesting. Thanks!

1 次回应

要查看或添加评论，请登录

Bartosz Gajda的更多文章

Art of Data Newsletter - Issue #19

2023年8月22日

Art of Data Newsletter - Issue #19

Welcome all Data fanatics. In today's issue: Open challenges in #LLM research How #GenerativeAI can revolutionize Data…
Art of Data Newsletter - Issue #17

2023年7月31日

Art of Data Newsletter - Issue #17

Welcome all Data fanatics. In today's issue: Are #Kubernetes days numbered? The future of #Observability - 7 things to…
Art of Data Newsletter - Issue #16

2023年7月23日

Art of Data Newsletter - Issue #16

Welcome all Data fanatics. In today's issue: Real-Time #MachineLearning foundations at Lyft Most data engineers are Mid…
Art of Data Newsletter - Issue #15

2023年7月10日

Art of Data Newsletter - Issue #15

Welcome all Data fanatics. In today's issue: LinkedIn explains their new data pipeline orchestrator - Hoptimator…
Art of Data Newsletter - Issue #14

2023年7月2日

Art of Data Newsletter - Issue #14

Welcome all Data fanatics. In today's issue: Databricks announces LakehouseIQ - LLM-based Assistant for working with…
Art of Data Newsletter - Issue #13

2023年6月23日

Art of Data Newsletter - Issue #13

Welcome all Data fanatics. In today's issue: StackOverflow Survey 2023 Why consumers don't trust your Data? Data…
Art of Data Newsletter - Issue #12

2023年6月13日

Art of Data Newsletter - Issue #12

Welcome all Data fanatics. In today's issue: The rapid explosion of #AI may come to an end, due to protective licensing.
Art of Data Newsletter - Issue #11

2023年6月6日

Art of Data Newsletter - Issue #11

Welcome all Data fanatics. In today's issue: Complexities of Production AI systems Uber built Spark Analysers that…
Art of Data Newsletter - Issue #10

2023年5月29日

Art of Data Newsletter - Issue #10

Welcome all Data fanatics. In today's issue: Microsoft announces new Microsoft Fabric Databricks published 2023 State…
Art of Data Newsletter - Issue #9

2023年5月22日

Art of Data Newsletter - Issue #9

Welcome all Data fanatics. In today's issue: MLOps basics for Data Engineers Managing BigQuery at Reddit scale Compass…

See all articles

Art of Data Newsletter - Issue #18

Bartosz Gajda

Databricks - Azure - Python | Staff Azure Data Engineer @ Lingaro

The LLM Battle Begins: Google Bard vs ChatGPT (Ep. 231) - Data Science at Home Podcast | 25mins

Why (some data people) Love Rust? - by Daniel Beach | 11mins

Four ways to shoot yourself in the foot with Redis | 9mins

领英推荐

10 Things I Learned from Reading Fundamentals of Data Engineering | 17mins

How To Measure the Value of Internal Tools | Square Corner Blog | 5mins

A side-by-side comparison of Apache Spark and Apache Flink for common streaming use cases | AWS Big Data Blog | 14mins

Art of Data

284 位关注者

Bartosz Gajda的更多文章

社区洞察

其他会员也浏览了

Roadmap to Leveraging Generative AI in Data Science

Data Science Letters: New Series. 01. On Individual Crushes Against Entropy.

The March 2024 MinIO Newsletter

To Data & Beyond Week 4 Summary

August 2024 DVC Pulse!

Is Data Science Easy Or AI: Unveiling The Truth Behind The Buzz!

Issue #188 - THE ML ENGINEER ??

Data Science Connect | Microsoft

DATA Pill #049 - 91% of ML Models degrade in time, MLflow 2.3 and Secrets of Deep Reinforcement Learning

DATA Pill #055 - Microsoft builds the bomb and queues for Kafka

The LLM Battle Begins: Google Bard vs ChatGPT (Ep. 231) - Data Science at Home Podcast | 25mins

Why (some data people) Love Rust? - by Daniel Beach | 11mins

Four ways to shoot yourself in the foot with Redis | 9mins

领英推荐

10 Things I Learned from Reading Fundamentals of Data Engineering | 17mins

How To Measure the Value of Internal Tools | Square Corner Blog | 5mins

A side-by-side comparison of Apache Spark and Apache Flink for common streaming use cases | AWS Big Data Blog | 14mins

Art of Data

284 位关注者

Bartosz Gajda的更多文章

Art of Data Newsletter - Issue #19

Art of Data Newsletter - Issue #17

Art of Data Newsletter - Issue #16

Art of Data Newsletter - Issue #15

Art of Data Newsletter - Issue #14

Art of Data Newsletter - Issue #13

Art of Data Newsletter - Issue #12

Art of Data Newsletter - Issue #11

Art of Data Newsletter - Issue #10

Art of Data Newsletter - Issue #9

社区洞察

其他会员也浏览了

Roadmap to Leveraging Generative AI in Data Science

Data Science Letters: New Series. 01. On Individual Crushes Against Entropy.

The March 2024 MinIO Newsletter

To Data & Beyond Week 4 Summary

August 2024 DVC Pulse!

Is Data Science Easy Or AI: Unveiling The Truth Behind The Buzz!

Issue #188 - THE ML ENGINEER ??

Data Science Connect | Microsoft

DATA Pill #049 - 91% of ML Models degrade in time, MLflow 2.3 and Secrets of Deep Reinforcement Learning

DATA Pill #055 - Microsoft builds the bomb and queues for Kafka