登录查看更多内容

DATA Pill #041 - Streamlining Data Science Workflows, Machine Learning Models in LoL, and more…

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

发布日期: 2023年2月27日

+ 关注

Another week, another DATA Pill.

In this one, we’ll focus on:

Navigating the Data mesh, Machine Learning Models in the gaming world,?

and a little on A/B testing.

Enjoy the read!

ARTICLES

Navigating the Data Mesh: Organizational Challenges and Opportunities | 10 min | Pierre-Alain Genilloud | Data Engineering | ELCA IT?

Most of you without doubt have heard of the Data Mesh. Let’s take a deeper look at some implications in terms of organization and agility, challenges and opportunities. Also, let’s discuss the opportunities and open questions brought by the Data Mesh:

Opportunities:

empowers domains in the provision of their data, and improves recognition of their efforts
accelerates the introduction of new valuable data
lets individual domains build their analytics on their own
facilitates integration with operational use cases

Open questions:

which technology is required for data preparation?
which technology is required to support data consumption?

We would like to announce the dbt-flink-adapter, that allows running pipelines defined in SQL in a dbt project on Apache Flink! Check out the newest blog post and find out:

what the advantages of dbt and Apache Flink are?
what was the driver for our GetInData Streaming Labs team to create the adapter
how to build a real-time analytics pipeline.

Also, we deal with the myth that real-time analytics is not worth the cost.?

Dealing with confusion and duplicative work in your data science team can be exhausting. In this post, Roel explores ways to overcome these challenges and improve collaboration, consistency and speed within your data science team. Read about the Feature Catalog that can help data science teams work together better.

In MORE LINKS you will find content about Machine Learning Models into League of Legends and layering.?

{ MORE LINKS }

TUTORIAL

Snowflake Data Mesh: Step-by-Step Setup Guide, with Detailed Notes on Scaling and Maintenance | 25 min | Data Mesh | Atlan Blog

Data Mesh can be hard to implement. It requires an org-wide mindset shift toward decentralization and product thinking. Team Atlan attempted to demonstrate a reference Data Mesh implementation in a growth-stage organization with a complex business domain.

领英推荐

2021 Year in Review

Kate Strachnyi 3 年前

Databricks Data+AI Summit 2024: The headlines – and…

Kubrick Group 9 个月前

Starting Slow And Scaling Sustainably – Boosting…

Florian Roscheck 1 年前

NEWS

Uber Ditches On-Prem and Hooks Future to GCP and Oracle Cloud | 4 min | Cloud | Lisa D Sparks | Data Center Knowledge

Uber joins the cloud! It was a long resisted move by one of the largest Hadoop users. And now they are also converting & over the 7 next years they will migrate all of that over to GCP or Oracle. Data & Data workloads will probably go to GCP. There is a lot of news about it, but this piece seems to put forward an interesting view.

In MORE LINKS you will find better Airflow with Metaflow.

{ MORE LINKS }

VIDEO

Make Your A:B Testing More Effective and Efficient | 50 min | Analytics | Anjali Mehra | DataCamp

One of the toughest parts of any data project is experimentation, not just because you need to choose the right testing method to confirm the project’s effectiveness, but also because you need to make sure you are testing the right hypothesis and measuring the right KPIs to ensure you receive accurate results.?

One of the most effective methods for data experimentation is A/B testing, and Anjali Mehra is no stranger to how A/B testing can impact multiple parts of any organization.

Since we are talking about analytics, there is an interesting job offer available in that area.

PODCAST

Implementing Patterns And Practices For Infrastructure as Code | 56 min | Hosts: Ned Bellavance, Ethan Banks Guest: Rosemary Wang | Cloud | Day Two Cloud Podcast

A one hour talk with the Developer Advocate at HashiCorp and author of Infrastructure as Code, Patterns and Practices. Listen to more about Infrastructure as Code (IaC)including about the patterns and practices you might want to put in place. So you might want to apply some software development practices to it, particularly for the parts of your team who know what they’re doing with infrastructure but may not be familiar with things like repositories, re-usability, unit tests and so on.

Since we are talking about analytics, there is an interesting job offer available in that area.

CONFS EVENTS AND MEETUPS

Upgrade your Scaleup from using Spreadsheets to Data Platform | 14th March 2023 | Online

Do you want to know how to increase your data capabilities and become a data-driven company? Join the first webinar in series ‘Building a Data-Driven Company’ and learn what an implemented Modern Data Platform can look like and how it can assist you during your journey into modern analytics.

Webinar online 2023 - Big Data Technology Warsaw Summit | 9th March 2023 | Online

On March 9th you will have the opportunity to listen to presentations given by Mariusz Strzelecki from GetInData | Part of Xebia and Juan Cano from QuantumBlack:

One does not simply upgrade Airflow. 1.10 -> 2.4 case study
Analyze your data at the speed of light with Polars and Kedro

________________________

Have any interesting content to share in the DATA Pill newsletter?

? Join us on GitHub

? Dig previous editions of DataPill?

Adam from the GetInData | Part of Xebia

DATA Pill

2,557 位关注者

Lisa D. Sparks

Editor | Writer

2 年

Nice analysis, Adam Kawa. Thanks for sharing my work. I have similar coverage here: https://lisadsparks.substack.com/

要查看或添加评论，请登录

Adam Kawa的更多文章

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

2025年3月17日

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

Hi, Welcome to this week’s DATA Pill! We’ve got two Microsoft Fabric tutorials, AI insights from IBM Research, key data…
?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

2025年3月10日

?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

Hi, This week, we dive into MLOps, scaling DuckDB, DeepSeek-R1’s cost, and PayPal’s causal inference. Plus, meaty…
?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

2025年3月2日

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

Hi, The data world is moving fast. I bring you the latest in data engineering, AI, and analytics, from SQL tips to AI…

1 条评论
?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

2025年2月24日

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Hi, This week’s DATA Pill covers aligning data with business goals, key data trends for 2025, Apache Iceberg, and…

1 条评论
Mastering LLMs: 3 Blogs You Need to Read

2025年2月21日

Mastering LLMs: 3 Blogs You Need to Read

Large Language Models (LLMs) are at the forefront of technological innovation, transforming industries like e-commerce,…

1 条评论
?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

2025年2月17日

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

Hi, Train embeddings 400x faster, boost LLMs with knowledge graphs, and integrate real-time AI. Explore reasoning…

4 条评论
?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

2025年2月10日

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

Hi, Data engineering is shifting fast—ETL is evolving, AI is transforming search, and workflows are being redefined…
?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

2025年2月3日

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

Hi, This week, we're covering the latest in AI, data engineering, and distributed systems. From optimizing ETL…

1 条评论
?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

2025年1月27日

?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

Hi, Dive into the latest trends, tutorials, and innovations shaping the data world. ARTICLES Exploring the Potential of…

2 条评论
?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

2025年1月20日

?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

Hi, This week's highlights dive into AI-ready data strategies, real-time GenAI architectures, and a deep dive into the…

2 条评论

See all articles

DATA Pill #041 - Streamlining Data Science Workflows, Machine Learning Models in LoL, and more…

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

ARTICLES

TUTORIAL

领英推荐

NEWS

VIDEO

PODCAST

CONFS EVENTS AND MEETUPS

DATA Pill

2,557 位关注者

Adam Kawa的更多文章

社区洞察

其他会员也浏览了

Gartner vs Forrester on Data Science Platforms and Machine Learning Solutions

Exclusive Sneak Peak At What Is Data Science!

Episode 6: Business understanding for Data Science

Top 6 Data Science Pain Points in 2021

Data Science 2.0: From Analytic Outputs to Business Outcomes

Data Science 2.0: From Analytic Outputs to Business Outcomes

Step-by-Step Guide to Data Science at ONLEI Technologies

Understanding of Data Structures and Algorithms in Data Science

Tech Forecast 2017

Analytics and Data Science News for the Week of February 14; Updates from BARC, Databricks, DataRobot & More

ARTICLES

TUTORIAL

领英推荐

NEWS

VIDEO

PODCAST

CONFS EVENTS AND MEETUPS

DATA Pill

2,557 位关注者

Adam Kawa的更多文章

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Mastering LLMs: 3 Blogs You Need to Read

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

社区洞察

其他会员也浏览了

Gartner vs Forrester on Data Science Platforms and Machine Learning Solutions

Exclusive Sneak Peak At What Is Data Science!

Episode 6: Business understanding for Data Science

Top 6 Data Science Pain Points in 2021

Data Science 2.0: From Analytic Outputs to Business Outcomes

Data Science 2.0: From Analytic Outputs to Business Outcomes

Step-by-Step Guide to Data Science at ONLEI Technologies

Understanding of Data Structures and Algorithms in Data Science

Tech Forecast 2017

Analytics and Data Science News for the Week of February 14; Updates from BARC, Databricks, DataRobot & More