DATA Pill #079 - Kubernetes and Kernel panics, AI pack

DATA Pill #079 - Kubernetes and Kernel panics, AI pack

Hi,


AI hype is still buzzing.

Dig into the newest meaty content we found this week.


ARTICLES

Gartner’s AI Hype Cycle is Way Passed its Due Date — And Are We Entering a Classical ML Winter? | 11 min | ML | Oliver Molander | Personal Blog

This one underscores the significance of prioritizing data quality and notes the existence of a "Classical ML" winter, challenging the prevailing narrative around Large Language Models (LLMs). The author advocates a well-balanced approach, incorporating Generative AI and Classical ML based on the specific use cases.


Kubernetes And Kernel Panics | 6 min | Kubernetes | Kyle Anderson | Netflix Engineering Blog

This blog post shows how to connect the dots from the worst case scenario (a kernel panic) through to Kubernetes (k8s) and eventually up to us operators so that we can track how and why our k8s nodes are going away.


Running Unified PubSub Client in Production at Pinterest | 13 min | Data Infrastructure | Jeff Xiang, Vahid Hashemian, Jesus Zuniga | Pinterest Engineering Blog

Read how Pinterest's PubSub Client has revolutionized data transport, enhancing development speed, stability and scalability. Critical features like automated service discovery and optimized configurations have substantially reduced setup time and Flink application restarts. Over 90% of Java applications seamlessly migrated to PSC, and plans include error handling improvements, cost attribution and support for C++ and Python.



BIZ

Why is streaming data and real-time AI critical in telecom? | 5 min | AI | Adam Kawa | GetInData | Part of Xebia Blog

Dive into the dynamic intersection of streaming data and real-time AI in the telecom industry. Explore how these technologies are changing how networks work, developing user experiences and pushing the telecom industry forward.


Applied Generative AI for Enterprise | 5 min | AI | Erika Lyxell, Fredrik ?str?m, Tomas Keller, Johan Vallin, Rickard Wieselfors | Ericsson Blog

The text explores Generative AI's capabilities, highlighting the transformer model and the evolution of models like GPT-4 and BERT. It discusses GenAI's implementation within Ericsson, showcasing practical use cases such as intelligent assistants, coding buddies and improved intelligent search.



TUTORIALS

Building a Data Streaming Pipeline: Leveraging Kafka, Spark, Airflow, and Docker | 11 min | Data Streaming | Simardeep Singh | Personal Blog

This tutorial details the construction of a strong data pipeline with Kafka, Spark, Airflow, Docker, S3 and Python. Using the Random Name API for real-time data, a Python script fetches and bridges data to Kafka, seamlessly running through Airflow DAGs. Spark Structured Streaming processes and writes data to S3, showcasing a modular architecture with Docker for smooth interoperability, scalability and debugging.


In MORE LINKS you will read about: dbt Quicktip: Using deprecation_date to improve your model governance, Python Dependency Management in Spark Connect

{ MORE LINKS }



NEWS

Block Public Sharing of Amazon EBS Snapshots | 4 min | Cloud | Jeff Barr | AWS Blog

This update introduces the capability of preventing the public sharing of Amazon Elastic Block Store (EBS) snapshots on a per-region, per-account basis, enhancing protection against unintentional data exposure. Users can now quickly turn off public sharing through the AWS Management Console, AWS Command Line Interface, or the new EnableSnapshotBlockPublicAccess function, with the setting applied at the regional level and affecting snapshot visibility within minutes.


In MORE LINKS you will read about: New Vertex AI Feature Store built with BigQuery, ready for predictive and generative AI

{ MORE LINKS }



TOOLS

Create-llama, a command line tool to generate LlamaIndex apps | 2 min | AI |? LlamaIndex Blog

Want to use the power of LlamaIndex to load, index and chat with your data using LLMs like GPT-4? It just got a lot easier! Llama created a simple-to-use command-line tool to generate a full-stack app for you — bring your data!?



DATA TUBE

The more, the merrier: Managing a dynamic, expanding, self-service dbt project | 30 min | Data Engineering | Alice Leach | dbt

At Whatnot, the dbt project expanded from three to 50 developers and under 50 to over 1000 models in a year. Alice Leach, a data engineer, discusses the team's insights and solutions for scaling challenges, covering guard rails (CI/CD, model monitoring, and clean-up), guidelines (modular workspaces, macros, and documentation), and gadgets (dbt code generation and interfacing with other tools).



PODCAST

Shining Some Light In The Black Box Of PostgreSQL Performance | 55 min | SQL | Tobias Macey, Lukas Fittl | Data Engineering Podcast

Databases are the core of most applications but are often treated as mysterious black boxes. When an application is slow, there is a reasonable probability that the database needs some attention. In this episode, Lukas Fittl shares some hard-won wisdom about the causes and solutions of many performance bottlenecks and the work that he is doing to shine some light on PostgreSQL to make it easier to understand how to keep it running smoothly.



CONFS, EVENTS, AND MEETUPS

How to talk to your DATA with (or without) LLM | Webinar | 30th November

This webinar will give you a brief understanding of the essential challenges to measuring, managing and discussing business problems across the organization layers and the key to overcoming them.

What we will discuss:

  • Challenges in independently accessing data analysis by decision-makers
  • What is a data model? And why it matters
  • Why the generation of SQL is not enough to achieve value from data
  • Looker and data model management


________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill?


Adam from the GetInData | Part of Xebia

要查看或添加评论,请登录

Adam Kawa的更多文章

社区洞察

其他会员也浏览了