DATA Pill #025 - Data Meshes Missing element, all-in-one data stack renesans and more
Hi,
The new week has already started, so I am coming back with the next dose of DATA PILL.
Be sure you are ready, in today’s newsletter you will find ranking, report, a lot about cloud, and much more.
Do not waste time, here we go:
ARTICLES
Data Contracts: The Mesh Glue | 8 min | Data & ML | Luis Velasco | Toward Data Science Blog
Data Mesh + Open source components + data contracts
In this article Luis explains the “Data contract” concept, which ensures that information spread across different data products can be shared and reused along with a couple of technical implementations using open source components for one fundamental process in the data contracts lifecycle: its evaluation.
With the ultimate goal of building trust on “someone else's” data products, data contracts are artifacts that sit at the intersection of a (a) business glossary providing rich semantics, (b) a metadata catalog providing information about the structure (c) a data quality repository setting expectations about the content across different dimensions. To ease and promote data sharing.
State of AI Report 2022 | 10 min | AI | Nathan Benaich & Ian Hogarth | State of AI
The State of AI report 2022 has been released. Just wow - so much interesting content and recent developments summarized and analyzed in this report (not that new for someone that follows the AI field). There is also an investor's view on AI which is especially interesting.?
The MLSecOps Top 10 | 11 min | MLOps | The Institute for Ethical AI & Machine Learning
An initiative that aims to further the field of machine learning security by identifying the top 10 most common vulnerabilities in the machine learning life cycle. It also includes a set of practical hands-on examples of each of these vulnerabilities, as well as the best practices to address them - all the content is available open source.
MLOps' source of knowledge has not yet dried up. BTW, there's an interesting senior position in MLOps at Get in Data available! Check it out here
Why we're leaving the cloud | 6 min | Cloud | David Heinemeier Hansson | Basecamp?
Renting computers is (mostly) a bad deal for medium-sized companies with stable growth, like Basecamp. The savings promised in reduced complexity never materialized.?
The cloud excels at two ends of the spectrum:
The Next Generation Of All-In-One Data Stacks | 11 min read | Data Stack | Ben Rogojan | Seattle Data Guy Blog?
Is the modern data stack even modern?
Isn’t it just a piecemeal of components from solutions we have known forever like SAP or Informatica?
Isn’t it just an unbundled version of Airflow?
All-In-One Data Stacks rises.
Ben shares examples of all-in-one solutions: Incorta, Keboola, Nexla, Mozart Data, Rivery.
领英推荐
TOOLS AND TUTORIALS
Cube: API-First Business Intelligence | 5 min | BI?
A very nice semantic layer tool that is open source. Top features:
NEWS?
Scaling PyTorch models on Cloud TPUs with FSDP | 6 min | ML & MLOps | PyTorch Blog
To support model scaling on TPUs, we implemented the widely-adopted Fully Sharded Data Parallel (FSDP) algorithm for XLA devices as part of the PyTorch/XLA 1.12 release. This FSDP interface allowed us to easily build models with e.g. 10B+ parameters on TPUs and has enabled many research explorations.
DATA LIBRARY
Data on Kubernetes 2022 | 17 pages | Kubernetes | DoK Community
A report from the DoK Community. Insights from over 500 executives and technology leaders on how data on Kubernetes has a transformative impact on organizations, regardless of size or tech maturity.?
Data on Kubernetes has a transformative impact on organizations. Respondents
see a direct link from running DoK and making big gains: the? majority of them (83%) attribute over 10% of their revenue to running data on Kubernetes. One-third of organizations saw their productivity increase two-fold.
?PODCAST
Project Lightspeed: Next-generation Spark Streaming | 41 min | Streaming | hosts: Ben Lorica; guests: Karthik Ramasamy | The Data Exchange Podcast
41 minutes about faster and simpler tools for new streaming applications.
?CONFS AND MEETUPS
Art of Scala | 16 November | Scala | Warsaw
A non-commercial conference organized by Scala enthusiasts for Scala engineers.
A Review of the Presentations at the DataMass Gdańsk Summit 2022 | Grzegorz Ko?pu?, Maciej Maciejko, Sylwia Ko?pu? | GetInData
This conference has passed, but from this review you can get many takeaways. Creme de la creme of DataMass 2022
?
________________________
Have any interesting content to share in the DATA Pill newsletter?
? Join us on GitHub
Adam Kawa from GetInData