DATA Pill #043 - RecSys, Kubernetes, Prometheus and how Discord stores trillions of messages

DATA Pill #043 - RecSys, Kubernetes, Prometheus and how Discord stores trillions of messages

Hi,


Data Pill 43 is a huge one ;)?

We’ve digged meaty articles, tutorials and toolkits.

Brace yourselves for a lot of useful knowledge!



ARTICLES

How Discord stores trillions of messages | 10 min | Data Engineering | Bo Ingram | Discord Blog

In 2017 the Discord team shared their experience with storing billions of messages. After 5 years it is time for an update. Read the story on how the Discord team changed their data as they matured, what troubles they faced and how they solved them.

No alt text provided for this image


What are Graph Neural Networks and why should you consider using them in your Recommendation System? | 20 min | ML |? Micha? Stawikowski | GetInData | Part of Xebia Blog?

As businesses strive to provide personalized experiences to their customers, recommendation systems play a crucial role. However, traditional recommendation systems have limitations when it comes to handling complex relationships between users and items. This is where GNNs come in. Graph Neural Networks are part of an extremely active and rapidly growing field of research. They are representatives of one of the most powerful groups of machine learning algorithms, which are Artificial Neural Networks. Let’s explore the potential of Graph Neural Networks (GNNs) in improving recommendation systems.?

No alt text provided for this image


Prioritizing Home Attributes Based on Guest Interest | 7 min | ML | Joy Jing | Airbnb Tech Blog

How Airbnb leverages ML to derive guest interest from unstructured text data and provide personalized recommendations to Hosts. They do this through a scalable, platformized, and data-driven engineering system. This blog post describes the science and engineering behind the system.

High-Performance Data Teams Don’t Care About Data Quality | 11 min | Data Science |? Sven Balnojan | Personal Blog?

It turns out that high-performing data teams should not try to “increase data quality”, but? increase the speed and the quality of their work — at the same time. 9 good practices to help your data team increase their performance are ready to be read.?

No alt text provided for this image

In MORE LINKS you will read about: running Prometheus at scale, Kubernetes at Medium and the future of data.

{ MORE LINKS }



TUTORIAL

Running dbt on Google Cloud’s Vertex AI Pipelines | 7 min | Cloud | datatonic Blog

How can you use Vertex AI and dbt in an efficient and cost-effective way to orchestrate Machine Learning workloads? The 4 steps dbt on Vertex AI Pipelines tutorial explains it in an easy way.

No alt text provided for this image


In MORE LINKS you will find out how to create and deploy an AWS CloudFormation custom provider in less than 5 minutes.?


{ MORE LINKS }



NEWS

Introducing Webhooks in dbt Cloud | 3 min | Cloud | Jeremy Hutt | dbt Blog

Announcement about how dbt made it possible for dbt Cloud to notify other applications and tools when certain events take place in dbt Cloud, through outbound webhooks.


Debezium 2.2.0.Alpha3 Released | 3 min | SQL | Chris Cranford | Debezium Blog

Debezium is pleased to announce the third alpha release in the 2.2 release stream, Debezium 2.2.0.Alpha3. It includes a plethora of bug fixes, improvements, breaking changes and a number of new features including, but not limited to, optional parallel snapshots, server-side MongoDB change stream filtering, surrogate keys for incremental snapshots, a new Cassandra connector for Cassandra Enterprise and much more.


TOOLS

Spark-testing-base | Apache Spark | Holden Karau

A cool testing library for Apache Spark apps. It comes with a bunch of tools and test fixtures that help you write unit tests for Spark Streaming and SQL apps. It’s available on GitHub and works with popular testing frameworks like JUnit and ScalaTest.


PODCAST

How to Build Data and ML Products Users Love | 36 min | ML | Host: Jon Krohn Guest: Brian T. O’Neill | Super Data Science

In this one, you will learn a lesson from Brian, who says that teams will have to answer the question "what is the value that we're here to provide?" before adding that teams must begin by answering the right questions.?

Senior stakeholders often bring forth problems and suggestions, but it's data scientists who must begin by asking the right questions to create effective solutions. In an ideal world, a Data Product Manager would bridge the gap between both sides to facilitate communication and problem-solving.?


CONFS EVENTS AND MEETUPS

Upgrade your Scaleup from using Spreadsheets to Data Platform | 14th March 2023 | Online

Do you want to know how to increase your data capabilities and become a data-driven company? Join the first webinar in the series ‘Building a Data-Driven Company’ and learn what an implemented Modern Data Platform can look like and how it can assist you during your journey into modern analytics.


PetSmart Drives Personalization With Customer 360 on Databricks | March 16, 2023? 9am PT / 12pm ET | Online

Maximize your efficiency, delight your customers and drive your retail business to new levels of performance. The Databricks Lakehouse for Retail is designed to help businesses like yours handle data, analytics and AI on one platform. Learn how PetSmart has unified data, analytics and AI on Databricks to build personalized experiences.

?You’ll find out how:?

  • Databricks helps you empower your employees to collaborate in real time and build personalized experiences
  • PetSmart is building a Customer 360 on Databricks to power meaningful customer engagements at scale


________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill?


Adam from GetInData | Part of Xebia

Thank you Adam Kawa for putting me on this list! I feel honoured.

回复
Brian T. O'Neill

I help B2B AI and analytics product leaders remove sales and usability friction with UX design. | Host: Experiencing Data podcast | Founder: Data Product Leadership Community | MIT Sandbox startup advisor

2 年

Thx for the mention!

要查看或添加评论,请登录

Adam Kawa的更多文章

社区洞察

其他会员也浏览了