?? DATA Pill #097 - LLMs meet SQL, Confluent + Apache Flink = ?
Hi,
The birds are tweeting that this week brought some great combos.
LLMs and SQL, Apache Flink, and Confluent…
That’s not all; we have a new category where you can grow your skills called Skill Lake!
Enjoy the newest DATA Pill.
ARTICLES
What is the best tool: Apache Airflow, Azure Data Factory, or Databricks Workflows? | 7 min | Data Engineering | Pedro Pagano | Indicium Engineering Blog
In a recent data project, Pedro suggested Apache Airflow, but due to past experiences, the company chose a managed platform, leading to a project on Azure Cloud with Databricks. He compared Azure Data Factory and Databricks Workflows with Airflow to determine the best orchestrator/scheduler.
Retrieving information from SQL databases with the help of LLMs | 6 min | LLM | Piotr Chaberski | GetInData | Part of Xebia Blog
LLM has recently gained significant traction, inspiring innovative use cases and demos. As the hype evolves into practical applications, information retrieval emerges as a focal point, prompting considerations about deployment strategies, data privacy, and accessing information beyond LLMs' parameters.
Elevating Your Data Platform: The Strategic Role of Data Staging Area and how it fits Data Lakehouse paradigm | 14 min | Data Engineering | Szymon ?aczek | Level Up Coding
Optimize data management with a Data Staging Area in your Lakehouse architecture for enhanced security, efficiency, and scalability.
In MORE LINKS you will read about Lyft’s Reinforcement Learning Platform
SKILL LAKE
Big Data Technology Warsaw - Workshops | Warsaw, On-site | 9th April
Join us for a one-day workshop on Generative AI and large language models. This event aims to provide participants in-depth knowledge of the latest advancements in natural language processing, computer vision, and machine learning techniques for Gen AI.
In this one day workshop you will learn how to build streaming analytics apps that deliver instant results in a continuous manner on data-intensive streams. You will discover how to configure streaming pipelines, transformations, aggregations or triggers using SQL and Python in an user-friendly development environment using open source tools of Apache Flink, Apache Kafka and Getindata OSS projects.
Learn to build and optimize data pipelines using dbt and Snowflake in a one-day workshop. Discover how to enhance performance, quality, and cost-efficiency through materialization techniques, version control, testing, monitoring, and scheduling. Solve common data transformation challenges with modern tools, using hands-on exercises in a public cloud (GCP or AWS).
REMEMBER! Use the DataPill200 code to get the 200 PLN discount.
领英推荐
TUTORIALS
LLMs Meet SQL: Revolutionizing Data Querying with Natural Language Processing | 56 min | LLM | Senthil E | Level Up Coding
This article explores how powerful models simplify tasks by writing database queries from questions, building knowledgeable chatbots, and creating custom dashboards for preferred information. It will also uncover the potential of combining LLMs with structured data to unlock new possibilities and streamline data interaction.
In MORE LINKS you will read about: We built a new SQL Engine on Arrow and DataFusion
NEWS
Announcing the Release of Apache Flink 1.19 | 7 h | Data Streaming | Lincoln Lee
The Apache Flink PMC is pleased to announce the release of Apache Flink 1.19.0. As usual, we are looking at a packed release with a wide variety of improvements and new features. Overall, 162 people contributed to this release completing 33 FLIPs and 600+ issues.
Confluent Cloud for Apache Flink Is Now Generally Available | 8 min | Data Streaming | Jean-Sébastien Brunner, Hasan Jilani | Confluent Blog
Confluent has launched Confluent Cloud for Apache Flink, which is available on all major cloud platforms. This integration offers a unified, enterprise-grade solution for real-time data processing with Apache Kafka? and Flink. The blog details the unique features of this fully managed service and its readiness for mission-critical use cases at scale.
PODCAST
Open-Source LLM Libraries and Techniques | 1 h 48 min | LLM | Jon Krohn, Dr. Sebastian Raschka | Super Data Science: ML & AI Podcast
Jon Krohn sits down with Sebastian Raschka to discuss his latest book, Machine Learning Q and AI, the open-source libraries developed by Lightning AI, how to exploit the greatest opportunities for LLM development, and what’s on the horizon for LLMs.
CONFS EVENTS AND MEETUPS
Machine Learning Prague 2024 | Prague | 22-24th April 2024
World class expertise and practical content packed in 3 days. You can look forward to an excellent lineup of 40 international experts in ML and AI business and academic applications at ML Prague 2024. They will present advanced practical talks, hands-on workshops and other forms of interactive content to you.
_______________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill
Adam from the GetInData | Part of Xebia
Data platform architect
5 个月Thanks for the mention Adam : )
Platform Engineer | AWS Community Builder | Helping People Transition into DevOps & Cloud
5 个月That's an impressive lineup of topics! Can't wait to learn more about SQL with LLM. ??
? Cloud & Software Architect ? MLOps ? AIOps ? Helping companies scale their platforms to an enterprise grade level
5 个月Exciting discussions ahead! Let's explore the latest in SQL with LLM together. ?? Adam Kawa
Staff Architect @ Ververica | Apache Flink ??? Streaming Lakehouse ?? Everything is a Stream ??
5 个月Lol, I just saw Confluent’s video announcement - Industry’s ONLY cloud-native serverless solution. Really Confluent? Who writes the script? ?????? There are already 3 cloud-native serverless solutions available, which makes Confluent the 4th? Not to mention it’s just a beta ??♂?