DATA Pill #052 - LLM, observability, Data Catalogs & storage cost reduction again
Hi,
This week I am preparing you for the fact that there will be a lot of talk about Large Language Models in the near future.
Observability is another hot topic, and the icing on the cake is the unflagging trends of AI and cost reduction.
ARTICLES
How Canva saves millions annually in Amazon S3 costs | 6 min | Data Engineering | Josh Smith | Canva Developers
Dive into a detailed explanation of how to analyze S3 usage data, determine the most cost-effective storage class for your objects, and automate the process of moving objects between storage classes. The tips and strategies presented in this post can help you minimize the AWS costs while still providing reliable and scalable storage solutions.
Migrating Critical Traffic At Scale with No Downtime — Part 1 | 10 min | Data Engineering | Shyam Gala, Javier Fernandez-Ivern, Anup Rokkam Pratap, Devang Shah | Netflix TechBlog
This blog post provides a detailed analysis of replay traffic testing, a versatile technique the Netflix team have applied in the preliminary validation phase for multiple migration initiatives.
The Full Story of Large Language Models and RLHF | 17 min | Deep Learning | Marco Ramponi | AssemblyAI Blog
This guide covers the journey of large language models, from foundational ideas to the latest advancements, and how RLHF aligns them with human values.
Observability on Kubernetes - lessons learned | 6 min | Data Engineering | Piotr Mossakowski | GetInData | Part of Xebia Blog
More than half of GetInData active projects are those where they manage observability stacks completely. They design, implement and maintain monitoring, logging and tracing of our application stacks.
Piotr shares lessons learned on running observability stacks on Kubernetes like:?
In MORE LINKS you will read about:
TUTORIAL
Image classification with Debezium and TensorFlow | 8 min | ML | Vojtěch Juránek | Debezium Blog
In this one, Vojtěch discusses how the recent success of ChatGPT has created a new wave of interest in AI and machine learning. While ML frameworks like TensorFlow and PyTorch have made writing ML models more accessible, data set preparation can still be challenging. The blog explores the use of Debezium, a change data capture tool, in machine learning pipelines, and looks at how to stream data into TensorFlow for recognizing handwritten digits. You will also read about Debezium's support for single message transforms and its ability to deliver records to multiple message brokers.
TOOLS
Bytewax.io | Bytewax
Build streaming data applications easily in Python. Open source framework and distributed stream processing engine. Build streaming data pipelines and real-time apps with everything you need: recovery, scalability, windowing, aggregations, and connectors.
领英推荐
NEWS
Spark Connect Available in Apache Spark 3.4 | 3 min | Data Engineering | Allan Folting, Hyukjin Kwon, Xiao Li, Herman van H?vell, Stefania Leone, Martin Grund, Reynold Xin and Kris Mo | Databricks Blog
Revolutionize Apache Spark with this new technology now available. The new connector tool is designed to simplify data movement and improve data performance between Apache Spark and Delta Lake. It will enhance the capabilities of Spark and Delta Lake, making it easier and faster to work with large-scale data sets.
PODCASTS
ChatGPT and the OpenAI Developer Ecosystem | 55 min | AI | host: Adel Nehme guest: Logan Kilpatrick | DataFramed
Discover the power of ChatGPT with OpenAI's Logan Kilpatrick in this AI series episode. Learn about ChatGPT's plugins, image input features, and its integration into our daily lives. Gain valuable tips on how to get better responses and successfully integrate ChatGPT into your organization's product. Join the storm and explore the practical applications of AI in our lives.
In MORE LINKS you will listen about Revolutionizing B2B: Unleashing the Power of AI and Data
CONFS EVENTS AND MEETUPS
DATA + AI SUMMIT 2023 | 26-29th June | Online & San Francisco
Large Language Models (LLM) are taking AI mainstream. Join the premier event for the global data community to understand their potential and shape the future of your industry with data and AI.
Generative AI: Changing How Business Innovates and Operates | 31th May | Webinar
This complementary webinar explores multiple use cases that drive adoption in their early adopter customer base to provide product leaders with insights into the future of generative AI-powered businesses, and the potential generative AI holds for driving innovation and improving business processes.?
Leveraging Google Cloud's Large Language Models and Generative AI Services | 17th May | Amsterdam
Join this seminar to learn how to leverage the power of Google's LLMs for your business. In a few hours, you can learn how to effectively use LLMs for your organization. During the seminar, we will explore the developments that led to the rise of Generative AI and dive into different types of use cases.
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill ?
Adam from the GetInData | Part of Xebia