DATA Pill #066 - Powering the Latest LLM Innovation, Data contracts and schema enforcement with dbt
Hi,
Knock knock, the hottest DATA Pill this summer is here!
Take a breath with Snowflake, get some rest with the AWS and GCP ??
ARTICLES
Three challenges in deploying generative models in production | 9 min | Data Engineering | Aliaksei Mikhailiuk | Towards Data Science
In this blog, we'll delve into deploying generative models in production, tackling key challenges. Let’s focus on the latest developments in diffusion and GPT-based models, while also exploring broader applications across various model types.
Where’s My Data — A Unique Encounter with Flink Streaming’s Kinesis Connector | 12 min | Data Engineering | Seth Saperstein | Lyft Engineering Blog
Read the story on how Lyft faced with perseverance a job dealing with massive data streams from Kinesis to S3 and how they encountered persistent issues that strained Flink's capabilities. The investigation revealed complex challenges, including CPU throttling, event time alignment and subtask interactions, leading to a 5-day deadlock state that impeded data emission. Lyft addressed the problem by enhancing Flink's functionality and implementing better monitoring and mitigation strategies to prevent similar incidents in the future.
Securely Scaling Big Data Access Controls At Pinterest | 14 min | Data Engineering | Soam Acharya, Keith Regier | Pinterest Engineering Blog
The article discusses how Pinterest has achieved secure scalability for big data access controls. By implementing a custom authorization system, Pinterest ensures that data access is tightly controlled and audited, preventing unauthorized access. This solution leverages fine-grained access policies, scalable infrastructure and rigorous auditing to maintain data security and integrity in a growing environment.
TUTORIALS
Use Amazon Athena to query data stored in Google Cloud Platform | 6 min | Cloud | Jonathan Wong | AWS Blog
In this article, you will explore streamlining data access across the Google Cloud Platform and AWS, optimizing efficiency. Leveraging data connectors enables multi-cloud adaptability, boosting business expansion. Moreover, derived insights from data analysis facilitate enhanced BI application development, advancing organizational data analysis workflows.
In MORE LINKS you will find data contracts and schema enforcement with dbt and powering the latest LLM innovation, llama v2 in Snowflake, part 1
TOOL
Singer.io | ETL
Singer is an open-source standard for writing scripts that move data between databases, web APIs, files, queues, and just about anything else you can think of.
Singer describes how data extraction scripts—called “taps”—and data loading scripts—called “targets”— should communicate, allowing them to be used in any combination to move data from any source to any destination. Send data between databases, web APIs, files, queues, and just about anything else you can think of.
领英推荐
DATA TUBE
The conspiracy to make AI seem harder than it is! | 1 h 30 min | AI | Gustav S?derstr?m | Spotify R&D
From Spotify's corridors, comes an educational talk by an internal executive, now shared globally. The talk demystifies AI, making it accessible to all. What will you find here?
In MORE LINKS unlocking the power of Data Science in the cloud
REPLAY THE EVENT
AWS Storage Day | Virtually | 8 hours
Take a little step back and watch an event made by AWS. This event was ideal for anyone who is eager to learn more about:
? How to prepare for AI/ML with the storage decisions you make now
? How to deliver holistic data protection for your organization, including recovery planning to help protect against ransomware
? How to do more with your budget by optimizing storage costs for on-premises and cloud data
CONFS EVENTS AND MEETUPS
Building data-intensive applications with real-time data streaming | Virtual Hands-on Lab | 16th August 11 AM CEST
In today's landscape, businesses can harness extensive data for customer benefit, but numerous enterprises face challenges in readying vast datasets for analysis and delivering immediate insights through real-time data streaming. Explore the realm of crafting data applications with real-time data streaming by participating in Snowflake’s complimentary, instructor-led hands-on lab.
You will learn how to:
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill?
Adam from the GetInData | Part of Xebia