DATA Pill #051 - Flink Platform on Kubernetes? Top feature stores & a lot of AI

DATA Pill #051 - Flink Platform on Kubernetes? Top feature stores & a lot of AI

Hi,


Sorry, I couldn't help it…

AI, ChatGPT are beating a path to our door. And they made it ;)

But there are more interesting pieces of content: lots of news, and nice deep diving articles.

Shall we take today’s Data Pill?


But first…

If you like this newsletter, it will be super nice to hear from you. Just write me a comment with your feedback or just to say hi!?

I’m waiting on the other site. Not ChatGPT. At least for now ;)



ARTICLES

Scaling up the Prime Video audio/video monitoring service and reducing costs by 90% | 10 min | Streaming | Marcin Kolny | Prime Video Tech Blog

Microservices and serverless components are tools that do work at high scale, but the decision as to whether to use them over monolith or not has to be made on a case-by-case basis.

In the case of Amazon Prime, moving the service to a monolith reduced their infrastructure cost by over 90%. It also increased scaling capabilities.


Building a Flink Self-Serve Platform on Kubernetes at Scale | 7 min | Streaming | Sylvia Lin | tech-at-instacart Blog

Instacart’s used Flink to meet a range of needs like:

  • Real-time decision making, such as fraud/spam detection
  • Real-time data augmentation, like Catalog data pipelines
  • Machine Learning real-time feature generation
  • OLAP events ingestion for our experimentation platform

They accomplished all of this by running Flink on AWS’ EMR, so why have they decided to build a new platform on top of Kubernetes? And what are the lessons learned?


The entire Flink service onboarding and operations should be streamlined without K8S details. Most of our platform users don’t have knowledge of Kubernetes, so we should abstract K8S details as much as possible.



The 7 Most Popular Feature Stores In 2023 | 5 min | MLOps | Jakub Jurczak | GetInData | Part of Xebia Blog

In recent years, Feature Stores have become an integral part of many ML projects, and their popularity is continuing to grow. This article will look at the most popular solutions available this year.

No alt text provided for this image

…While we're on the subject of Feature Store, in this ebook, you'll find a step-by-step guide on how to design and build a Feature Store and also how it can solve critical machine learning problems: Build Feature Store Faster. An Introduction to Vertex AI, Snowflake and dbt Cloud


How We Integrated ChatGPT into Our Slack: Enhancing Privacy, Flexibility, and Collaboration | 4 min | Data Science and AI | Rens Dimmendaal | Xebia Blog

ChatGPT has become increasingly popular. Despite its popularity, Xebia spotted some areas for improvement, like privacy, flexibility and collaboration, to make it even better.

To address these issues, they developed an internal tool called SlackGPT.

SlackGPT not only tackles these limitations but also gives our colleagues a unique experience when working with and building modern LLM applications.


Track and improve the performance of streaming data pipelines with Datadog Data Streams Monitoring | 8 min | Data Streaming | Nicholas Thomson, Jane Wang, Jonathan Morin | Datadog Blog

Datadog can now monitor streaming data pipelines with Kafka as a bus.


In MORE LINKS you will read about anomaly detection at Lyft, dbt Squared, Data Streams Monitoring and the label noise solution tested at Allegro.


{ MORE LINKS }


TOOLS

Mojo ?? — a new programming language for all AI developers | Modular

Mojo combines the usability of Python with the performance of C, unlocking unparalleled programmability of AI hardware and extensibility of AI models.



ONEPAGER

Ecosystem around streaming and realtime including open source solutions with their vendor providers | Hubert Dulay

No alt text provided for this image



NEWS

Create Power BI reports in Jupyter Notebooks | 5 min | Power BI | Noam Raveh | Microsoft Blog

Microsoft announces the latest update to the Power BI and Jupyter Notebook library, which empowers users to create powerful reports based on their data directly in their notebooks, without leaving their workflow. With this new update, users can gain insights instantly without the hassle of switching between tools or dealing with cumbersome data exports.


Mark Zuckerberg says Meta wants to ‘introduce AI agents to billions of people’ | 3 min | AI | Alex Heath | The Verge Blog

Mark Zuckerberg, CEO of Meta, has stated that generative AI will eventually be integrated into all of the company's products due to its potential impact on billions of users.



DATA ODDITY

There’s an AI for that

The name says it all. Need something to be done by AI? Check out the list of tools, which is getting longer and longer every minute.



PODCASTS

Data Journey with Varun Bhatnagar (Swedbank) - MLOps in the Cloud at Swedbank - Enterprise Analytics Platform | 55 min | MLOps | host: Adam Kawa guest: Varun Bhatnagar | Radio DaTa Podcast

  • An overview of the solution: Enterprise Analytics Platform (EAP)
  • Evolution of MLOps at Swedbank - How it all started and how the solution has evolved over time.
  • Iterative development for ML models - How can one improve the iterative development process for ML models?
  • The secret of success - What has led to this successful migration?
  • Key take-away points and the lessons learned from our ML cloud transformation journey and how can one start or improve in this area?



DATA Tube

Open Assistant takes on ChatGPT | 1H | host: Tim Scarfe guest: Yannic Kilcher: | Machine Learning Street Talk

Let’s hear about Open Assistant - an ambitious project aiming to create a truly open-source AI language model. Yannic reveals the behind-the-scenes process of developing this revolutionary technology, addressing the critical role of community involvement and the importance of a diverse dataset.



CONFS EVENTS AND MEETUPS

Goodbye, Data Warehouse. Hello, Lakehouse | 18 May | Online webinar

Learn from Databricks, Fivetran and dbt Labs experts about how to:

  • Automate data movement and transform raw data into analytics-ready tables using your favorite tools like Fivetran and dbt
  • Unify and govern business-critical data at scale to build a curated data lake for data warehousing, SQL and BI
  • Reduce costs and get started in seconds with on-demand, elastic SQL serverless compute
  • Use automated and real-time lineage to monitor end-to-end data flow


Data Strategy In The World Of Multiple AI Innovations 'almost' Every Week | 9 May | Online webinar

  • What is a data strategy, and why do you need one?
  • How to build a proper data strategy?
  • How to use the latest tools to 10x productivity of your employees?


Data Mass | Call for Presentation | 5th October 2023

The Summit is aimed at people who use the cloud in their daily work to solve Data Engineering, Big Data, Data Science, Machine Learning and AI problems. The main idea of the conference is to promote knowledge and experience in designing and implementing tools for solving difficult and interesting challenges. If you have something to share with the community in this area - submit your presentation!


________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill?



Adam from the GetInData | Part of Xebia

要查看或添加评论,请登录

社区洞察

其他会员也浏览了