DATA Pill #066 - Powering the Latest LLM Innovation, Data contracts and schema enforcement with dbt

DATA Pill #066 - Powering the Latest LLM Innovation, Data contracts and schema enforcement with dbt


Hi,

Knock knock, the hottest DATA Pill this summer is here!

Take a breath with Snowflake, get some rest with the AWS and GCP ??


ARTICLES

Three challenges in deploying generative models in production | 9 min | Data Engineering | Aliaksei Mikhailiuk | Towards Data Science

In this blog, we'll delve into deploying generative models in production, tackling key challenges. Let’s focus on the latest developments in diffusion and GPT-based models, while also exploring broader applications across various model types.

Where’s My Data — A Unique Encounter with Flink Streaming’s Kinesis Connector | 12 min | Data Engineering | Seth Saperstein | Lyft Engineering Blog

Read the story on how Lyft faced with perseverance a job dealing with massive data streams from Kinesis to S3 and how they encountered persistent issues that strained Flink's capabilities. The investigation revealed complex challenges, including CPU throttling, event time alignment and subtask interactions, leading to a 5-day deadlock state that impeded data emission. Lyft addressed the problem by enhancing Flink's functionality and implementing better monitoring and mitigation strategies to prevent similar incidents in the future.

Securely Scaling Big Data Access Controls At Pinterest | 14 min | Data Engineering | Soam Acharya, Keith Regier | Pinterest Engineering Blog

The article discusses how Pinterest has achieved secure scalability for big data access controls. By implementing a custom authorization system, Pinterest ensures that data access is tightly controlled and audited, preventing unauthorized access. This solution leverages fine-grained access policies, scalable infrastructure and rigorous auditing to maintain data security and integrity in a growing environment.


TUTORIALS

Use Amazon Athena to query data stored in Google Cloud Platform | 6 min | Cloud | Jonathan Wong | AWS Blog

In this article, you will explore streamlining data access across the Google Cloud Platform and AWS, optimizing efficiency. Leveraging data connectors enables multi-cloud adaptability, boosting business expansion. Moreover, derived insights from data analysis facilitate enhanced BI application development, advancing organizational data analysis workflows.


In MORE LINKS you will find data contracts and schema enforcement with dbt and powering the latest LLM innovation, llama v2 in Snowflake, part 1

{ MORE LINKS }



TOOL

Singer.io | ETL

Singer is an open-source standard for writing scripts that move data between databases, web APIs, files, queues, and just about anything else you can think of.

Singer describes how data extraction scripts—called “taps”—and data loading scripts—called “targets”— should communicate, allowing them to be used in any combination to move data from any source to any destination. Send data between databases, web APIs, files, queues, and just about anything else you can think of.



DATA TUBE

The conspiracy to make AI seem harder than it is! | 1 h 30 min | AI | Gustav S?derstr?m | Spotify R&D

From Spotify's corridors, comes an educational talk by an internal executive, now shared globally. The talk demystifies AI, making it accessible to all. What will you find here?

  • What is an LLM?
  • What about creativity?
  • ?How do you steer it?
  • Why did no one see it coming?
  • Intelligence is compression!
  • Diffusion Models - Generating images, video and music
  • Conditioning on text

In MORE LINKS unlocking the power of Data Science in the cloud

{ MORE LINKS }




REPLAY THE EVENT

AWS Storage Day | Virtually | 8 hours

Take a little step back and watch an event made by AWS. This event was ideal for anyone who is eager to learn more about:

? How to prepare for AI/ML with the storage decisions you make now

? How to deliver holistic data protection for your organization, including recovery planning to help protect against ransomware

? How to do more with your budget by optimizing storage costs for on-premises and cloud data



CONFS EVENTS AND MEETUPS

Building data-intensive applications with real-time data streaming | Virtual Hands-on Lab | 16th August 11 AM CEST

In today's landscape, businesses can harness extensive data for customer benefit, but numerous enterprises face challenges in readying vast datasets for analysis and delivering immediate insights through real-time data streaming. Explore the realm of crafting data applications with real-time data streaming by participating in Snowflake’s complimentary, instructor-led hands-on lab.

You will learn how to:

  • ingest near real-time data sets into Snowflake?
  • create Kafka producer and consumer apps
  • integrate Snowpipe streaming SDK with Kafka consumer apps
  • populate Snowflake tables with time-series data in JSON format

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill?


Adam from the GetInData | Part of Xebia

要查看或添加评论,请登录

Adam Kawa的更多文章

社区洞察

其他会员也浏览了