登录查看更多内容

Real-Time Data on the Factory Floor

Robin Moffatt

Sr. Principal Advisor, Streaming Data Technologies

发布日期: 2023年10月27日

I enjoyed listening to this episode of the Analytics Engineering Podcast (presented by Tristan Handy and Julia Schottenstein ), with guests Nathan Bean from 通用磨坊 alongside Materialize 's Arjun Narayan .

I took two things from it:

1?? streaming use-cases are everywhere, including the factory floor

2?? FUD is still alive and well in the software industry ?? ???

?? My notes are below with direct timecode URLs to the corresponding section ??

?? Data comes from devices across the factory. Most don't support pushing the data so a variety of polling methods and techniques are used to sample the data. Intervals are ~100-500ms.

?? Challenges in the data include bad signals causing missing, wrong, and out of order data.

Once sampled, the data is written to local time-series database (TSDB).

?? https://overcast.fm/+w94UAJW7M/07:52

?? The data is useful locally for SME on the plant floor, but for broader usage it needs another layer of contextualisation to make the information usable.

Infrastructure is on-premises for latency and other reasons.

?? https://overcast.fm/+w94UAJW7M/09:52

Where does real-time fit in manufacturing? Three tiers (in increasing order of sophistication)

1?? Know about a potential problem as soon as it occurs

2?? Use ML algo to get advice

3?? Use ML algo to directly take corrective steps (within guardrails)

?? https://overcast.fm/+w94UAJW7M/17:27

Arjun Narayan notes that Hadoop and NoSQL were a mistake for those companies that weren't Google or Amazon scale. Michael Stonebreaker was one of the few who called it out back then

?? https://overcast.fm/+w94UAJW7M/16:07

(here's the FUD-y stuff) "#apacheKafka and #apacheFlink are hard to use, you have to use Java, you can't use SQL."

?? ehm… Flink SQL?

领英推荐

Spark Dynamic Resource Allocation

Ankur Ranjan 11 个月前

Data Wars: Vector Strikes Back

Lawrence Fernandes 4 个月前

Subject: ?? DATA Pill #124 - SQL Has Problems, RAG…

Adam Kawa 5 个月前

"You don't have the operational ease"

?? ehm… managed services?

?? https://overcast.fm/+w94UAJW7M/24:21

(to be clear: cloud DW with SQL are easy, yes. But casting Flink and Kafka today as they were several years ago is simply ill-informed at best and duplicitous at worst)

?? "If you're trying to advance human understanding, batch is fine" (strategic analytics)

??If a human has to make a decision (e.g. on the plant floor), or we're trying to automate it - then use streaming.

?? https://overcast.fm/+w94UAJW7M/26:52

(I don't disagree with this as it's a useful framing for priorities in adoption and identifying early candidates for streaming projects. That said, once streaming really is as easy as batch, at that point batch becomes redundant)

Back to the FUD-y stuff:

Bespoke streaming pipelines are really complicated and take huge resources, if you're not Uber or Netflix then except for tier 1 use-cases like fraud detection you'll not be able to do it (unless you use Materialize, obvs ??)

?? https://overcast.fm/+w94UAJW7M/28:53

?? "The fact we're talking about streaming is a bug, it's an implementation detail. "

?? "You should Talk about streaming as often as you talk about B-Tree indexes"

?? https://overcast.fm/+w94UAJW7M/30:46

??Decomposing business calculations into streaming operators is painful.

????The people who write the stream operators are distant from the business domain. How do we get the technical tooling (e.g. SQL to the folk on the plant floor?

?? https://overcast.fm/+w94UAJW7M/32:34

?? So, overall this was an interesting podcast to listen to as it helps illustrate both where streaming fits, some of the challenges, and ways to reason about approaching it.

?? What I didn't enjoy so much was the vendor-heavy pitching against Flink in particular. Simply illustrating the difficulty of streaming adoption and use historically would have been fine without dunking inaccurately on other projects.

? But hey, I am probably biased given that I work for Decodable —who provide a managed #ApacheFlink and #Debezium service that gives you stream processing with SQL and no Java code ??

(although if you wanna bring your Java code, we can run that too)

#dataEngineering #streamProcessing

Tun Shwe

1 年

If you're interested in how real-time fits in manufacturing, you might like my talk at Open Source Data Summit next week. The use case is anomaly detection (with a ML model) of robots in a factory. Furthermore, I sidestep the FUD argument by solving it with Kafka and Python running in a managed service ??

1 次回应

要查看或添加评论，请登录

Robin Moffatt的更多文章

The enshittification of "blogs"

2024年10月17日

The enshittification of "blogs"

I'm seeing a real trend amongst vendors to post verbatim AI-generated ?? under the guise of a blog. The irony is that…

14 条评论
How I Try To Keep Up With The Data Tech World (A List of Data Blogs)

2024年5月22日

How I Try To Keep Up With The Data Tech World (A List of Data Blogs)

Originally published at https://rmoff.net on 22 May 2024 I do my best to try and keep, if not abreast of, then at least…

3 条评论
Coalesce 2022: Talks I'm Looking Forward To

2022年10月11日

Coalesce 2022: Talks I'm Looking Forward To

It looks like a fascinating set of talks at dbt Labs' #dbtcoalesce next week. I'll be firing up my 56k modem and…
?? ??? Call for Speakers: Current 2022: The Next Generation of Kafka Summit

2022年5月24日

?? ??? Call for Speakers: Current 2022: The Next Generation of Kafka Summit

Current 2022: The Next Generation of Kafka Summit is a technical conference for everything data in motion, and will…
Kafka Summit London 2022- Call for Papers closes soon

2021年12月7日

Kafka Summit London 2022- Call for Papers closes soon

Kafka Summit is the fandabidoziest conference dedicated to Apache Kafka? and event streaming. The Call for Papers (CfP)…
Kafka Connect JDBC Sink deep-dive: Working with Primary Keys

2021年3月15日

Kafka Connect JDBC Sink deep-dive: Working with Primary Keys

I've published a new blog and accompanying video tutorial that goes into the details of handling primary keys with the…
?? ksqlDB HOWTO - A mini video series

2021年2月18日

?? ksqlDB HOWTO - A mini video series

?? Watch an episode a week, or binge-watch them all in one evening…however you want to consume them, enjoy this brand…
Kafka Connect - Deep Dive into Single Message Transforms

2021年1月4日

Kafka Connect - Deep Dive into Single Message Transforms

KIP-66 was added in Apache Kafka 0.10.

1 条评论
Connecting Kafka to other systems? You *might* be doing it wrong…

2020年11月19日

Connecting Kafka to other systems? You *might* be doing it wrong…

Whilst it might be fun, re-inventing the wheel isn't the best use of your time—which is why Kafka Connect exists ?? ?…
?? A collection of Kafka-related talks ??

2020年9月23日

?? A collection of Kafka-related talks ??

Here's a collection of Kafka-related talks, just for you. Each one has ???? a recording, ?? slides, and ?? code to go…

1 条评论

See all articles

Real-Time Data on the Factory Floor

Robin Moffatt

Sr. Principal Advisor, Streaming Data Technologies

领英推荐

Robin Moffatt的更多文章

社区洞察

其他会员也浏览了

?? DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

SPARK - Partitioning

DATA Pill #066 - Powering the Latest LLM Innovation, Data contracts and schema enforcement with dbt

DATA Pill #075 - 5 Best Data Observability Platforms, to dbt or not to dbt

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

A Very Modern Data Stack

Smart City Real-Time Data Engineering: AWS, Kafka, Spark, S3, Glue, Redshift, Lambda and PowerBI

?? DATA Pill #136 - From Apache Iceberg to Real-Time AI: Trends, Tutorials, and Tools for Modern Data Pros

?? End-to-End Databricks & Spark Project #1: From Business Comprehension to Data Pipelines, Data Ingestion and Bronze Layer

Data Engineer Ascends to Engineering Nirvana After Successfully Completing Modern Data Stack

领英推荐

Robin Moffatt的更多文章

The enshittification of "blogs"

How I Try To Keep Up With The Data Tech World (A List of Data Blogs)

Coalesce 2022: Talks I'm Looking Forward To

?? ??? Call for Speakers: Current 2022: The Next Generation of Kafka Summit

Kafka Summit London 2022- Call for Papers closes soon

Kafka Connect JDBC Sink deep-dive: Working with Primary Keys

?? ksqlDB HOWTO - A mini video series

Kafka Connect - Deep Dive into Single Message Transforms

Connecting Kafka to other systems? You *might* be doing it wrong…

?? A collection of Kafka-related talks ??

社区洞察

其他会员也浏览了

?? DATA Pill #108 - Orchestrating 2000+ dbt Models, Databricks + Tabular

SPARK - Partitioning

DATA Pill #066 - Powering the Latest LLM Innovation, Data contracts and schema enforcement with dbt

DATA Pill #075 - 5 Best Data Observability Platforms, to dbt or not to dbt

?? DATA Pill #142 - From RAG to fabric, Don’t count rows in ETL, use Delta Log metrics!

A Very Modern Data Stack

Smart City Real-Time Data Engineering: AWS, Kafka, Spark, S3, Glue, Redshift, Lambda and PowerBI

?? DATA Pill #136 - From Apache Iceberg to Real-Time AI: Trends, Tutorials, and Tools for Modern Data Pros

?? End-to-End Databricks & Spark Project #1: From Business Comprehension to Data Pipelines, Data Ingestion and Bronze Layer

Data Engineer Ascends to Engineering Nirvana After Successfully Completing Modern Data Stack

Connecting Kafka to other systems? You might be doing it wrong…