登录查看更多内容

?? DATA Pill #112 - Decodable vs. Amazon MSF, Flink SQL - changelog and races

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

发布日期: 2024年7月8日

This week you can dive into transformative insights and tutorials that will help you optimize LLMs, modernize your data infrastructure, and harness real-time analytics.

Enjoy!

ARTICLES

Best practices for LLM optimization for call and message compliance: prompt engineering, RAG, and fine-tuning | 25 min | LLM | Alec Coyle-Nicolas, Simon Greenman | Personal Blog

At Salus AI, the team optimized LLM performance for marketing calls in premium health screening services using prompt engineering, RAG, and fine-tuning techniques, improving accuracy from 80% to 95-100%. This blog shares their insights and findings, showcasing how LLMs can surpass traditional rule-based compliance monitoring solutions.?

Modernizing Uber’s Batch Data Infrastructure with Google Cloud Platform | 5 min | Data Engineering | Abhi Khune, Arun Mahadeva Iyer, Matt Mathew, Sahana Bhat | Uber Engineering Blog

With one of the world's largest Hadoop installations, Uber is modernizing its extensive data infrastructure by migrating its batch data analytics and machine learning stack to the Google Cloud Platform (GCP). This move aims to enhance productivity, engineering efficiency, and cost-effectiveness. The blog outlines Uber's strategy for leveraging GCP's cloud storage, ensuring user transparency, and improving data governance.

Decodable vs. Amazon MSF: Running Your First Apache Flink Job | 5 min | Data Engineering | Gunnar Morling | decodable Blog

Data engineers frequently ask how Decodable's Apache Flink-based ETL service compares to Amazon's Managed Service for Apache Flink (MSF). This post highlights the key differences and similarities to help you choose the best fit, especially if you're moving from a self-managed Flink cluster to a managed service.

TUTORIAL

Uncover social media insights in real time using Amazon Managed Service for Apache Flink and Amazon Bedrock | 7 min | Real-time analytics | Francisco Morillo, Subham Rakshit, Sergio Garcés Vitale | AWS blog

This post combines real-time analytics with generative AI to analyze tweets using Amazon Flink, Bedrock's Titan Embeddings, and OpenSearch Service. Users query via a Streamlit frontend, with a Lambda function retrieving tweets and generating insights using Anthropic Claude LLM. This solution enables real-time trend identification, sentiment analysis, and targeted customer segmentation.?

In MORE LINKS you will read about:

Flink SQL - changelog and races
Building and scaling Notion’s data lake

{ MORE LINKS }

Brij kishore Pandey 3 个月前

Data Science Prowess in Microsoft Fabric

Sonata Software 1 年前

A Revolution in Analytical Technology

Tom Davenport 7 年前

PODCAST

How Microsoft Scales Testing and Safety for Generative AI | 57 min | AI | Sarah Bird, Sam Charrington | TWIML Podcast

Listen to a talk with Sarah Bird, Microsoft's chief product officer of responsible AI, about the testing and evaluation techniques used for the safe deployment of generative AI and large language models. Sarah shares insights on the unique risks, challenges, defense strategies, and lessons learned from the 'Tay' and 'Bing Chat' incidents.

DATA TUBE

Orchestrate generative AI with Workflows | 27 min | AI | Google Cloud Tech

Workflows is a versatile service for automating microservices, business processes, and ML pipelines, including generative AI calls. Explore how Workflows can orchestrate AI calls to Vertex AI, with a demo on creating a map-reduce style workflow for summarizing large texts.?

In MORE LINKS you will watch about:

Achieving near zero down time deployments for fraud detection applications with Mastercard

{ MORE LINKS }

CONFS EVENTS AND MEETUPS

Fine-tuning Open Source LLMs with Mistral | Online | 16th July

In this session, Andrea, a Computing Engineer at CERN, and Josep, a Data Scientist at the Catalan Tourist Board, will walk you through the steps needed to customize the open-source Mistral LLM. You'll learn about choosing a suitable LLM, getting training data, tokenization, evaluating model performance, and best practices for fine-tuning.

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill ?

Adam from the GetInData | Part of Xebia

?? DATA Pill #112 - Decodable vs. Amazon MSF, Flink SQL - changelog and races

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

ARTICLES

TUTORIAL

领英推荐

PODCAST

DATA TUBE

CONFS EVENTS AND MEETUPS

DATA Pill

2,471 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

?? DATA Pill #097 - LLMs meet SQL, Confluent + Apache Flink = ?

??DATA Pill #101 - What Is a Streaming Database? Flink SQL: Misconfiguration, Misunderstanding, and Mishap

DATA Pill #062 - Netflix's Data Mesh, Lyft’s ML, Ubers lakehouse and (best?) open-source LLM

ProntoPro’s Data team - Gaining insights into the future of local services!

DATA Pill #030 - news from AWS and GitHub, creative testing, Search Pipeline and more

DATA Pill #082 - Gemini, Flink Forward 2023 takeaways, analytics with Apache Arrow

?? DATA Pill #102 - 50 Years of SQL, dbt + Airflow = ?

DATA Pill #066 - Powering the Latest LLM Innovation, Data contracts and schema enforcement with dbt

?? DATA Pill #110 - Optimizing Flink SQL, Let's reproduce GPT-2

DATA Pill #037 - Big Tech Ideas for 2023, Software Engineering Roadmap and MLOps Q&A

ARTICLES

TUTORIAL

领英推荐

PODCAST

DATA TUBE

CONFS EVENTS AND MEETUPS

DATA Pill

2,471 位关注者

DATA Pill #131 - Embeddings are underrated, The advent of the Open Data Lake

2024年11月18日

?? DATA Pill #130 - Top 7 Alternatives to Apache Flink, How to run data science projects

2024年11月11日

?? DATA Pill #129 - From ETL to AI, dbt: Incremental but Incomplete

2024年11月4日

?? DATA Pill #128 - dbt? at BlaBlaCar, What CDC is (and isn’t)

2024年10月28日

?? DATA Pill #127 - dbt Semantic Layer, CSVs Into Graphs Using LLMs

2024年10月21日

?? DATA Pill #126 - 6 Best LLM Tools To Run Models Locally, Unified Data + AI Governance with Unity Catalog

2024年10月14日

?? DATA Pill #125 - Exposing dbt models in Looker, RAG with Postgres

2024年10月7日

Subject: ?? DATA Pill #124 - SQL Has Problems, RAG API, QueryGPT

2024年9月30日

?? DATA Pill #123 - Stateless vs. Stateful Stream Processing, BigQuery Engine for Apache Flink

2024年9月23日

?? DATA Pill #122 - Master Dashboards, Terraform Databricks, and Boost Your Data Strategy

2024年9月16日

社区洞察

其他会员也浏览了

?? DATA Pill #097 - LLMs meet SQL, Confluent + Apache Flink = ?

??DATA Pill #101 - What Is a Streaming Database? Flink SQL: Misconfiguration, Misunderstanding, and Mishap

DATA Pill #062 - Netflix's Data Mesh, Lyft’s ML, Ubers lakehouse and (best?) open-source LLM

ProntoPro’s Data team - Gaining insights into the future of local services!

DATA Pill #030 - news from AWS and GitHub, creative testing, Search Pipeline and more

DATA Pill #082 - Gemini, Flink Forward 2023 takeaways, analytics with Apache Arrow

?? DATA Pill #102 - 50 Years of SQL, dbt + Airflow = ?

DATA Pill #066 - Powering the Latest LLM Innovation, Data contracts and schema enforcement with dbt

?? DATA Pill #110 - Optimizing Flink SQL, Let's reproduce GPT-2

DATA Pill #037 - Big Tech Ideas for 2023, Software Engineering Roadmap and MLOps Q&A