DATA Pill #030 - news from AWS and GitHub, creative testing, Search Pipeline and more

DATA Pill #030 - news from AWS and GitHub, creative testing, Search Pipeline and more

Hi,


Today will be without anecdotes, memes and funny comparisons.?

There is no time for this.

We have plenty of topics for you.

Let's get started right away!

?

ARTICLES

Using MLOps to Build a Real-time End-to-End Machine Learning Pipeline | 8 min | MLOps | Binance Blog

This article shows how Binance solves various business problems, including fraud, P2P scams, and stolen payment details. Read and understand better why they are using MLOps, how they effectively ensure the production model considers the latest data pattern, and see the standard operating procedure for real-time model development with a feature store.


Setting the Table: Benchmarking Open Table Formats | 6 min | Modern Data Stack | Brooklyn Data Co. Blog

Modern Data Stack is growing rapidly. Also, open table storage formats are getting more attention. Take a look at the article on the BROOKLYN DATA CO. blog and read how they ran a set of comprehensive workloads against each of them to test the performance of inserts and deletes, and the effect of these updates on the performance of subsequent reads.


Creative Testing: AI is the new A/B | 8 min | AI | Team Twigeo | Twigeo Blog

New limits for mobile tracking have led publishers like Meta and Google to shift from AB testing to dynamic creative formats, reliant on algorithms. It may look like marketers are losing creative control, but algorithms are reactive and the results sustain better campaign performance over time.?


Search Pipeline | Part 1 | Part 2 | 10 min | Pipelines architecture? | Stuart Cam | Canva Engineering Blog

In part 1 Stuart discusses the challenges Canva faced with current search architecture, the requirements needed for a new architecture and the considerations to take into account in designing a new solution. In the second part, we’ll dive into the details of our new search pipeline architecture.


Why Should I Care About Table Formats Like Apache Iceberg? | 7 min | Apache Iceberg | Alex Merced | Dremio Blog

Reducing your data warehouse footprint with an Apache Iceberg-based data lakehouse will open up your data to best-in-breed tools, reduce redundant storage/compute costs, and enable cutting-edge features like partition evolution/catalog branching to enhance your data architecture.?

In the past, the Hive table format did not go far enough to make this a reality, but today Apache Iceberg offers robust features and performance for querying and manipulating your data on the lake.?

Now is the time to turn your data lake into a data lakehouse and start seeing the time to insight shrink along with your data warehouse costs..

{ MORE LINKS }


?

NEWS

Exciting new GitHub features powering machine learning | 5 min | ML | Seth Juarez | GitHub Blog

In November, GitHub released Universe announcements. How do they affect ML? Here are the findings from building machine learning projects directly on GitHub.?

Jupyter Notebooks: Not only can I see the cells that have been added, but I can also see side-by-side the code differences within the cells, as well as the literal outputs. I can see at a glance the code that has changed and the effect it produces thanks to NbDime running under the hood.

While the rendering additions to GitHub are fantastic, there’s still the issue of executing the things in a reliable way when I’m away from my desk. Here’s a couple of gems to make these issues go away:

  • GPUs for Codespaces
  • Zero-config notebooks in Codespaces
  • Edit your notebooks from VS Code, PyCharm, JupyterLab, on the web, or even using the CLI (powered by Codespaces)


AWS Announces DataZone, a New Data Management Service to Govern Data | 2 min | AWS | Daniel Dominguez | InfoQ Blog

At AWS re:Invent, Amazon Web Services announced Amazon DataZone, a new data management service that makes it faster and easier for customers to catalog, discover, share and govern data stored across AWS, on-premises and third-party sources.

{ MORE LINKS }


?

NEWS

How to create a Devcontainer for your Python project | 8 min | MLOps & Docker | Jeroen Overschie | GoDataDriven Blog

Dev Containers can help us:

  • ?Get a reproducible development environment
  • ?Instantly onboard new team members onto your project
  • ?Better align the environments between team members
  • ?Keep your dev environment up-to-date & reproducible, which saves your team time with going into production later

Let’s explore how we can set up a Dev container for your Python project!

{ MORE LINKS }



PODCAST

Data Journey with Kevin Goldsmith (Anaconda) - Data & analytics used internally at Anaconda, SQL vs. Python, Layoffs and hiring in the tech sector, Agile data projects | 50 min | Analytics & Data | host: Adam Kawa ; guest: Kevin Goldsmith | Radio DaTa

  • Data and analytics used internally by Anaconda
  • The role and responsibilities of CTO at Anaconda
  • SQL vs. Python in data science
  • Hiring and layoffs in the tech industry
  • An agile approach to data engineering and data science projects


Data Analytics Career Orientation | 1 h | Analytics | host: Jon Krohn; guest: Luke Barousse | Super Data Science

Talk with Luke Barousse, a full-time YouTuber who produces content to help aspiring data scientists, founder of MacroFit, a data-driven company that helps with meal planning.

  • how data science can help you while working on a submarine?
  • helpful hacks for data science beginners

{ MORE LINKS }



DataTube

Trino at Apple | 23 min | Analytics | Vinitha Gankid | Trino

Listen to how engineers from Apple shared the current usage of Trino at their company. They discuss how they support Trino as a service for multiple end-users, and the critical features that drew Apple to Trino. They wrap up with some challenges they faced and some development they have planned to contribute to Trino.

?{ MORE LINKS }



CONFS EVENTS AND MEETUPS

move(data) The Data Practitioner Conference | 1-8 December | Online?

In the conference, speakers who have spent countless hours working on data integration take part. Best practices, horror stories, tools and workflows that will improve the way you work.


Security Best Practices with Databricks ?| 14 December | Live Webinar

  • How to build a secure Databricks environment which complies with industry best practices;
  • Where to find the best practices for your chosen Cloud provider;
  • How to stay informed proactively about security risks before they manifest;
  • About staying vigilant on any settings changes to remain compliant.

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub


Adam Kawa from GetInData

要查看或添加评论,请登录

Adam Kawa的更多文章

社区洞察

其他会员也浏览了