登录查看更多内容

DATA Pill #021 - serverless Lock-in, real-time AI and a lot from the open-source giants

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

发布日期: 2022年10月3日

+ 关注

Hi!

Do you really need MLOps?

Thankfully a new, freshly extracted DATA Pill is already waiting for you to answer this and much more questions.

ARTICLES?

Concerned about Serverless Lock-in? Consider Patterns! | 12 min | Cloud | Gregor Hohpe | The Architect Elevator Blog

A Lock-in to the cloud is maybe unavoidable, but the risk is strongly reduced if you introduce good architecture patterns as an abstraction layer. The author nicely describes this concept here.

Enabling real-time AI with Streaming Ingestion in Vertex AI | 7 min read | AI & ML | Erwin Huizenga & Kaz Sato | Google Cloud Blog

It’s difficult to set up the infrastructure needed to support high-throughput updates and low-latency retrieval of data.?

Starting this month, the Vertex AI Matching Engine and Feature Store will support real-time Streaming Ingestion as Preview features. With Streaming Ingestion for Matching Engine, a fully managed vector database for a vector similarity search and items in an index are updated continuously and reflected in the similarity search results immediately.

This blog post covers how these new features can improve predictions and enable near real-time use cases, such as recommendations, content personalization and cybersecurity monitoring.

BTW, last week I recommended the ebook about Building a Feature Store (with an introduction to Vertex AI). That was a coincidence, but… right on time ;)?

Upgrading Data Warehouse Infrastructure at Airbnb | 10 min read | Cloud | Ronnie Zhu, Edgar Rodriguez, Jason Xu, Gustavo Torres, Kerim Oktay & Xu Zhang | Airbnb Tech Blog

Airbnb’s experience with upgrading their Data Warehouse infrastructure to Spark and Iceberg.

In our data ingestion framework, we found that we could take advantage of Iceberg’s flexibility to define multiple partition specs to consolidate ingested data over time. Ingested tables write new data with an hourly granularity (ds/hr), and a daily automated process compresses the files on a daily partition (ds), without losing the hourly granularity, which later can be applied to queries as a residual filter.

No, you don’t need MLOps | 5 min read | MLOps | Lak Lakshmanan | Personal Blog

A bit of a provocative title, but the content features a concrete proposition of Keep it Simple alternatives to complex MLOps solutions.

It also provides some sort of a rule of thumb when complexity is actually necessary, so it's not all hype.

Evolution of Streaming Pipelines in Lyft’s Marketplace | 6 min read | Streaming | Rakesh Kumar | Lyft Engineering Blog

Lyft’s journey of evolving our streaming platform and pipeline to better scale and support new use cases. Each iteration provided a better scale, but also exposed shortcomings.

{ MORE LINKS }

领英推荐

The Future Of Cloud-Based Data, Analytics, and Machine…

Bernard Marr 2 年前

Why Companies Deploying RAG-Powered AI on Kubernetes…

Ashish Patel ???? 4 个月前

H2O.ai Makes AI Approachable for Enterprises

Sramana Mitra 1 年前

TUTORIALS

LoadBalancer Services using Kubernetes in Docker (kind) | 11 min read | Kubernetes | Owain Williams | Groupon Blog

Tutorial to multi-node kind cluster with extraPortMappings to forward requests from your host to an NGINX ingress controller, which uses the path to send your request to the appropriate service, rewriting the target so it can recognise the request.

NEWS

OpenTest: McDonald’s debut into open-source software | 4 min | Adrian Theodorescu | McDonald’s Technical Blog

A short insight into why McDonald's open sourced OpenTest.

The open sourcing for us led to another significant benefit by reducing the unnecessary friction involved in getting the software onto people’s machines. No more approvals required and no more dependencies on other teams for the actual binaries and updates.

?

PODCAST

What Data Visualization Means for Data Literacy | 41 min | AI | Host: Ben Lorica, Guest: Yashar Behzadi | The Data Exchange

how data visualization increases organizational data literacy
the best practices for visual storytelling

Synthetic data technologies can enable more capable and ethical AI | 40 min | Data Visualization | Andy Cotgreave | DataFramed

Yashar Behzadi is the CEO & Founder of Synthesis AI, a startup that uses synthetic data technologies to enable teams to build AI applications, as well as gaming and metaverse applications.

CONFS AND MEETUPS

Data Driven Innovation | 12 0ctober | Online

The third edition of the Big Data, AI, ML and Data Science conference organized by Computerworld Magazine.

Building Machine Learning pipelines with Kedro and Vertex AI on GCP? | 25 October | MLOps | Micha? Bry? | Free Webinar?

Micha? Bry? - Senior ML Engineer and Technical Product Owner will cover:

Why we need a pipeline for machine learning models
Kedro, an open-source Python framework for creating reproducible, maintainable and modular data science code
Q&A session

{ MORE LINKS }

___________________________

See you next week ??

Adam Kawa from GetInData

DATA Pill

2,557 位关注者

要查看或添加评论，请登录

Adam Kawa的更多文章

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

2025年3月17日

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

Hi, Welcome to this week’s DATA Pill! We’ve got two Microsoft Fabric tutorials, AI insights from IBM Research, key data…
?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

2025年3月10日

?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

Hi, This week, we dive into MLOps, scaling DuckDB, DeepSeek-R1’s cost, and PayPal’s causal inference. Plus, meaty…
?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

2025年3月2日

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

Hi, The data world is moving fast. I bring you the latest in data engineering, AI, and analytics, from SQL tips to AI…

1 条评论
?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

2025年2月24日

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Hi, This week’s DATA Pill covers aligning data with business goals, key data trends for 2025, Apache Iceberg, and…

1 条评论
Mastering LLMs: 3 Blogs You Need to Read

2025年2月21日

Mastering LLMs: 3 Blogs You Need to Read

Large Language Models (LLMs) are at the forefront of technological innovation, transforming industries like e-commerce,…

1 条评论
?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

2025年2月17日

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

Hi, Train embeddings 400x faster, boost LLMs with knowledge graphs, and integrate real-time AI. Explore reasoning…

4 条评论
?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

2025年2月10日

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

Hi, Data engineering is shifting fast—ETL is evolving, AI is transforming search, and workflows are being redefined…
?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

2025年2月3日

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

Hi, This week, we're covering the latest in AI, data engineering, and distributed systems. From optimizing ETL…

1 条评论
?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

2025年1月27日

?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

Hi, Dive into the latest trends, tutorials, and innovations shaping the data world. ARTICLES Exploring the Potential of…

2 条评论
?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

2025年1月20日

?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

Hi, This week's highlights dive into AI-ready data strategies, real-time GenAI architectures, and a deep dive into the…

2 条评论

See all articles

DATA Pill #021 - serverless Lock-in, real-time AI and a lot from the open-source giants

Adam Kawa

CEO at GetInData, ex-Spotify | Data & AI for banks, telecoms, retail & more.

ARTICLES?

领英推荐

TUTORIALS

NEWS

?

PODCAST

CONFS AND MEETUPS

DATA Pill

2,557 位关注者

Adam Kawa的更多文章

社区洞察

其他会员也浏览了

Key Resources Every Company Needs to Build a Strong AI Foundation

Databricks vs. Snowflake vs. AWS SageMaker vs. Microsoft Fabric: A GenAI Comparison

New Era of Datascience in the Cloud

Data Readiness with AWS: Empowering Your Generative AI Journey

Beyond Pipelines: Why Most ML Models Fail in Production (And How to Fix It)

Demystifying Gen AI and Harnessing Data for Innovation on AWS

AWS re:Invent 2024 – AI, Analytics, Silicon, Storage and Data Observability

Gleecus Gazette - December 2024

The Database Powering Zepto’s 10-Minute Delivery

Your ML Model is Dying—And You Don’t Even Know It

ARTICLES?

领英推荐

TUTORIALS

NEWS

?

PODCAST

CONFS AND MEETUPS

DATA Pill

2,557 位关注者

Adam Kawa的更多文章

?? DATA Pill #148 - Tackling AI Hallucinations in LLM Apps, Open Standards for Data Lineage

?? DATA Pill #147 - Are you ready for MLOps? ?? DuckDB goes distributed?

?? DATA Pill #146 - SQL is all you need, 30 Must-Know Tools for Python Development

?? DATA Pill #145 - Data vs. Business Strategy, Top Themes in Data in 2025

Mastering LLMs: 3 Blogs You Need to Read

?? DATA Pill #144 - Train 400x faster Static Embedding Models, LLMs and Graphs Synergy

?? DATA Pill #143 - ETL is Dead, The Golden Path Revolution

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

?? DATA Pill #141 - Multi-Team Airflow, The Dawn of AI Agents

?? DATA Pill #140 - Apache Kafka + Vector Database + LLM = Real-Time GenAI, 3 Steps to AI-Ready Data

社区洞察

其他会员也浏览了

Key Resources Every Company Needs to Build a Strong AI Foundation

Databricks vs. Snowflake vs. AWS SageMaker vs. Microsoft Fabric: A GenAI Comparison

New Era of Datascience in the Cloud

Data Readiness with AWS: Empowering Your Generative AI Journey

Beyond Pipelines: Why Most ML Models Fail in Production (And How to Fix It)

Demystifying Gen AI and Harnessing Data for Innovation on AWS

AWS re:Invent 2024 – AI, Analytics, Silicon, Storage and Data Observability

Gleecus Gazette - December 2024

The Database Powering Zepto’s 10-Minute Delivery

Your ML Model is Dying—And You Don’t Even Know It