?? DATA Pill #097 - LLMs meet SQL, Confluent + Apache Flink = ?

?? DATA Pill #097 - LLMs meet SQL, Confluent + Apache Flink = ?

Hi,

The birds are tweeting that this week brought some great combos.

LLMs and SQL, Apache Flink, and Confluent…

That’s not all; we have a new category where you can grow your skills called Skill Lake!

Enjoy the newest DATA Pill.

ARTICLES

What is the best tool: Apache Airflow, Azure Data Factory, or Databricks Workflows? | 7 min | Data Engineering | Pedro Pagano | Indicium Engineering Blog

In a recent data project, Pedro suggested Apache Airflow, but due to past experiences, the company chose a managed platform, leading to a project on Azure Cloud with Databricks. He compared Azure Data Factory and Databricks Workflows with Airflow to determine the best orchestrator/scheduler.

Retrieving information from SQL databases with the help of LLMs | 6 min | LLM | Piotr Chaberski | GetInData | Part of Xebia Blog

LLM has recently gained significant traction, inspiring innovative use cases and demos. As the hype evolves into practical applications, information retrieval emerges as a focal point, prompting considerations about deployment strategies, data privacy, and accessing information beyond LLMs' parameters.

Elevating Your Data Platform: The Strategic Role of Data Staging Area and how it fits Data Lakehouse paradigm | 14 min | Data Engineering | Szymon ?aczek | Level Up Coding

Optimize data management with a Data Staging Area in your Lakehouse architecture for enhanced security, efficiency, and scalability.

In MORE LINKS you will read about Lyft’s Reinforcement Learning Platform

{ MORE LINKS }

SKILL LAKE

Big Data Technology Warsaw - Workshops | Warsaw, On-site | 9th April

  • Building Generative AI Based Applications With LLMs and Data Augmentation Architectures

Join us for a one-day workshop on Generative AI and large language models. This event aims to provide participants in-depth knowledge of the latest advancements in natural language processing, computer vision, and machine learning techniques for Gen AI.

  • Data Streaming: Analyze Your Data in Real-Time With Flink

In this one day workshop you will learn how to build streaming analytics apps that deliver instant results in a continuous manner on data-intensive streams. You will discover how to configure streaming pipelines, transformations, aggregations or triggers using SQL and Python in an user-friendly development environment using open source tools of Apache Flink, Apache Kafka and Getindata OSS projects.

  • Advanced analytics engineering with Snowflake and dbt

Learn to build and optimize data pipelines using dbt and Snowflake in a one-day workshop. Discover how to enhance performance, quality, and cost-efficiency through materialization techniques, version control, testing, monitoring, and scheduling. Solve common data transformation challenges with modern tools, using hands-on exercises in a public cloud (GCP or AWS).

REMEMBER! Use the DataPill200 code to get the 200 PLN discount.

TUTORIALS

LLMs Meet SQL: Revolutionizing Data Querying with Natural Language Processing | 56 min | LLM | Senthil E | Level Up Coding

This article explores how powerful models simplify tasks by writing database queries from questions, building knowledgeable chatbots, and creating custom dashboards for preferred information. It will also uncover the potential of combining LLMs with structured data to unlock new possibilities and streamline data interaction.

In MORE LINKS you will read about: We built a new SQL Engine on Arrow and DataFusion

{ MORE LINKS }

NEWS

Announcing the Release of Apache Flink 1.19 | 7 h | Data Streaming | Lincoln Lee

The Apache Flink PMC is pleased to announce the release of Apache Flink 1.19.0. As usual, we are looking at a packed release with a wide variety of improvements and new features. Overall, 162 people contributed to this release completing 33 FLIPs and 600+ issues.

Confluent Cloud for Apache Flink Is Now Generally Available | 8 min | Data Streaming | Jean-Sébastien Brunner, Hasan Jilani | Confluent Blog

Confluent has launched Confluent Cloud for Apache Flink, which is available on all major cloud platforms. This integration offers a unified, enterprise-grade solution for real-time data processing with Apache Kafka? and Flink. The blog details the unique features of this fully managed service and its readiness for mission-critical use cases at scale.

PODCAST

Open-Source LLM Libraries and Techniques | 1 h 48 min | LLM | Jon Krohn, Dr. Sebastian Raschka | Super Data Science: ML & AI Podcast

Jon Krohn sits down with Sebastian Raschka to discuss his latest book, Machine Learning Q and AI, the open-source libraries developed by Lightning AI, how to exploit the greatest opportunities for LLM development, and what’s on the horizon for LLMs.

CONFS EVENTS AND MEETUPS

Machine Learning Prague 2024 | Prague | 22-24th April 2024

World class expertise and practical content packed in 3 days. You can look forward to an excellent lineup of 40 international experts in ML and AI business and academic applications at ML Prague 2024. They will present advanced practical talks, hands-on workshops and other forms of interactive content to you.

_______________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill

Adam from the GetInData | Part of Xebia



Szymon ?aczek, PhD

Data platform architect

5 个月

Thanks for the mention Adam : )

回复
Lionel Tchami

Platform Engineer | AWS Community Builder | Helping People Transition into DevOps & Cloud

5 个月

That's an impressive lineup of topics! Can't wait to learn more about SQL with LLM. ??

回复
Marcelo Grebois

? Cloud & Software Architect ? MLOps ? AIOps ? Helping companies scale their platforms to an enterprise grade level

5 个月

Exciting discussions ahead! Let's explore the latest in SQL with LLM together. ?? Adam Kawa

回复
Giannis Polyzos

Staff Architect @ Ververica | Apache Flink ??? Streaming Lakehouse ?? Everything is a Stream ??

5 个月

Lol, I just saw Confluent’s video announcement - Industry’s ONLY cloud-native serverless solution. Really Confluent? Who writes the script? ?????? There are already 3 cloud-native serverless solutions available, which makes Confluent the 4th? Not to mention it’s just a beta ??♂?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了