DATA Pill #051 - Flink Platform on Kubernetes? Top feature stores & a lot of AI
Hi,
Sorry, I couldn't help it…
AI, ChatGPT are beating a path to our door. And they made it ;)
But there are more interesting pieces of content: lots of news, and nice deep diving articles.
Shall we take today’s Data Pill?
But first…
If you like this newsletter, it will be super nice to hear from you. Just write me a comment with your feedback or just to say hi!?
I’m waiting on the other site. Not ChatGPT. At least for now ;)
ARTICLES
Scaling up the Prime Video audio/video monitoring service and reducing costs by 90% | 10 min | Streaming | Marcin Kolny | Prime Video Tech Blog
Microservices and serverless components are tools that do work at high scale, but the decision as to whether to use them over monolith or not has to be made on a case-by-case basis.
In the case of Amazon Prime, moving the service to a monolith reduced their infrastructure cost by over 90%. It also increased scaling capabilities.
Building a Flink Self-Serve Platform on Kubernetes at Scale | 7 min | Streaming | Sylvia Lin | tech-at-instacart Blog
Instacart’s used Flink to meet a range of needs like:
They accomplished all of this by running Flink on AWS’ EMR, so why have they decided to build a new platform on top of Kubernetes? And what are the lessons learned?
The entire Flink service onboarding and operations should be streamlined without K8S details. Most of our platform users don’t have knowledge of Kubernetes, so we should abstract K8S details as much as possible.
The 7 Most Popular Feature Stores In 2023 | 5 min | MLOps | Jakub Jurczak | GetInData | Part of Xebia Blog
In recent years, Feature Stores have become an integral part of many ML projects, and their popularity is continuing to grow. This article will look at the most popular solutions available this year.
…While we're on the subject of Feature Store, in this ebook, you'll find a step-by-step guide on how to design and build a Feature Store and also how it can solve critical machine learning problems: Build Feature Store Faster. An Introduction to Vertex AI, Snowflake and dbt Cloud
How We Integrated ChatGPT into Our Slack: Enhancing Privacy, Flexibility, and Collaboration | 4 min | Data Science and AI | Rens Dimmendaal | Xebia Blog
ChatGPT has become increasingly popular. Despite its popularity, Xebia spotted some areas for improvement, like privacy, flexibility and collaboration, to make it even better.
To address these issues, they developed an internal tool called SlackGPT.
SlackGPT not only tackles these limitations but also gives our colleagues a unique experience when working with and building modern LLM applications.
Track and improve the performance of streaming data pipelines with Datadog Data Streams Monitoring | 8 min | Data Streaming | Nicholas Thomson, Jane Wang, Jonathan Morin | Datadog Blog
Datadog can now monitor streaming data pipelines with Kafka as a bus.
In MORE LINKS you will read about anomaly detection at Lyft, dbt Squared, Data Streams Monitoring and the label noise solution tested at Allegro.
TOOLS
Mojo combines the usability of Python with the performance of C, unlocking unparalleled programmability of AI hardware and extensibility of AI models.
领英推荐
ONEPAGER
Ecosystem around streaming and realtime including open source solutions with their vendor providers | Hubert Dulay
NEWS
Create Power BI reports in Jupyter Notebooks | 5 min | Power BI | Noam Raveh | Microsoft Blog
Microsoft announces the latest update to the Power BI and Jupyter Notebook library, which empowers users to create powerful reports based on their data directly in their notebooks, without leaving their workflow. With this new update, users can gain insights instantly without the hassle of switching between tools or dealing with cumbersome data exports.
Mark Zuckerberg says Meta wants to ‘introduce AI agents to billions of people’ | 3 min | AI | Alex Heath | The Verge Blog
Mark Zuckerberg, CEO of Meta, has stated that generative AI will eventually be integrated into all of the company's products due to its potential impact on billions of users.
DATA ODDITY
The name says it all. Need something to be done by AI? Check out the list of tools, which is getting longer and longer every minute.
PODCASTS
Data Journey with Varun Bhatnagar (Swedbank) - MLOps in the Cloud at Swedbank - Enterprise Analytics Platform | 55 min | MLOps | host: Adam Kawa guest: Varun Bhatnagar | Radio DaTa Podcast
DATA Tube
Open Assistant takes on ChatGPT | 1H | host: Tim Scarfe guest: Yannic Kilcher: | Machine Learning Street Talk
Let’s hear about Open Assistant - an ambitious project aiming to create a truly open-source AI language model. Yannic reveals the behind-the-scenes process of developing this revolutionary technology, addressing the critical role of community involvement and the importance of a diverse dataset.
CONFS EVENTS AND MEETUPS
Goodbye, Data Warehouse. Hello, Lakehouse | 18 May | Online webinar
Learn from Databricks, Fivetran and dbt Labs experts about how to:
Data Strategy In The World Of Multiple AI Innovations 'almost' Every Week | 9 May | Online webinar
Data Mass | Call for Presentation | 5th October 2023
The Summit is aimed at people who use the cloud in their daily work to solve Data Engineering, Big Data, Data Science, Machine Learning and AI problems. The main idea of the conference is to promote knowledge and experience in designing and implementing tools for solving difficult and interesting challenges. If you have something to share with the community in this area - submit your presentation!
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
? Dig previous editions of DataPill?
Adam from the GetInData | Part of Xebia