DATA Pill #030 - news from AWS and GitHub, creative testing, Search Pipeline and more
Hi,
Today will be without anecdotes, memes and funny comparisons.?
There is no time for this.
We have plenty of topics for you.
Let's get started right away!
?
ARTICLES
Using MLOps to Build a Real-time End-to-End Machine Learning Pipeline | 8 min | MLOps | Binance Blog
This article shows how Binance solves various business problems, including fraud, P2P scams, and stolen payment details. Read and understand better why they are using MLOps, how they effectively ensure the production model considers the latest data pattern, and see the standard operating procedure for real-time model development with a feature store.
Setting the Table: Benchmarking Open Table Formats | 6 min | Modern Data Stack | Brooklyn Data Co. Blog
Modern Data Stack is growing rapidly. Also, open table storage formats are getting more attention. Take a look at the article on the BROOKLYN DATA CO. blog and read how they ran a set of comprehensive workloads against each of them to test the performance of inserts and deletes, and the effect of these updates on the performance of subsequent reads.
Creative Testing: AI is the new A/B | 8 min | AI | Team Twigeo | Twigeo Blog
New limits for mobile tracking have led publishers like Meta and Google to shift from AB testing to dynamic creative formats, reliant on algorithms. It may look like marketers are losing creative control, but algorithms are reactive and the results sustain better campaign performance over time.?
Search Pipeline | Part 1 | Part 2 | 10 min | Pipelines architecture? | Stuart Cam | Canva Engineering Blog
In part 1 Stuart discusses the challenges Canva faced with current search architecture, the requirements needed for a new architecture and the considerations to take into account in designing a new solution. In the second part, we’ll dive into the details of our new search pipeline architecture.
Why Should I Care About Table Formats Like Apache Iceberg? | 7 min | Apache Iceberg | Alex Merced | Dremio Blog
Reducing your data warehouse footprint with an Apache Iceberg-based data lakehouse will open up your data to best-in-breed tools, reduce redundant storage/compute costs, and enable cutting-edge features like partition evolution/catalog branching to enhance your data architecture.?
In the past, the Hive table format did not go far enough to make this a reality, but today Apache Iceberg offers robust features and performance for querying and manipulating your data on the lake.?
Now is the time to turn your data lake into a data lakehouse and start seeing the time to insight shrink along with your data warehouse costs..
?
NEWS
Exciting new GitHub features powering machine learning | 5 min | ML | Seth Juarez | GitHub Blog
In November, GitHub released Universe announcements. How do they affect ML? Here are the findings from building machine learning projects directly on GitHub.?
Jupyter Notebooks: Not only can I see the cells that have been added, but I can also see side-by-side the code differences within the cells, as well as the literal outputs. I can see at a glance the code that has changed and the effect it produces thanks to NbDime running under the hood.
While the rendering additions to GitHub are fantastic, there’s still the issue of executing the things in a reliable way when I’m away from my desk. Here’s a couple of gems to make these issues go away:
AWS Announces DataZone, a New Data Management Service to Govern Data | 2 min | AWS | Daniel Dominguez | InfoQ Blog
At AWS re:Invent, Amazon Web Services announced Amazon DataZone, a new data management service that makes it faster and easier for customers to catalog, discover, share and govern data stored across AWS, on-premises and third-party sources.
领英推荐
?
NEWS
How to create a Devcontainer for your Python project | 8 min | MLOps & Docker | Jeroen Overschie | GoDataDriven Blog
Dev Containers can help us:
Let’s explore how we can set up a Dev container for your Python project!
PODCAST
Data Journey with Kevin Goldsmith (Anaconda) - Data & analytics used internally at Anaconda, SQL vs. Python, Layoffs and hiring in the tech sector, Agile data projects | 50 min | Analytics & Data | host: Adam Kawa ; guest: Kevin Goldsmith | Radio DaTa
Data Analytics Career Orientation | 1 h | Analytics | host: Jon Krohn; guest: Luke Barousse | Super Data Science
Talk with Luke Barousse, a full-time YouTuber who produces content to help aspiring data scientists, founder of MacroFit, a data-driven company that helps with meal planning.
DataTube
Trino at Apple | 23 min | Analytics | Vinitha Gankid | Trino
Listen to how engineers from Apple shared the current usage of Trino at their company. They discuss how they support Trino as a service for multiple end-users, and the critical features that drew Apple to Trino. They wrap up with some challenges they faced and some development they have planned to contribute to Trino.
CONFS EVENTS AND MEETUPS
move(data) The Data Practitioner Conference | 1-8 December | Online?
In the conference, speakers who have spent countless hours working on data integration take part. Best practices, horror stories, tools and workflows that will improve the way you work.
Security Best Practices with Databricks ?| 14 December | Live Webinar
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
Adam Kawa from GetInData