Real-Time Data on the Factory Floor
Midjourney: a high tech factory with lots of devices connected by computers

Real-Time Data on the Factory Floor

I enjoyed listening to this episode of the Analytics Engineering Podcast (presented by Tristan Handy and Julia Schottenstein ), with guests Nathan Bean from 通用磨坊 alongside Materialize 's Arjun Narayan .

I took two things from it:

1?? streaming use-cases are everywhere, including the factory floor

2?? FUD is still alive and well in the software industry ?? ???


?? My notes are below with direct timecode URLs to the corresponding section ??


?? Data comes from devices across the factory. Most don't support pushing the data so a variety of polling methods and techniques are used to sample the data. Intervals are ~100-500ms.

?? Challenges in the data include bad signals causing missing, wrong, and out of order data.

Once sampled, the data is written to local time-series database (TSDB).

?? https://overcast.fm/+w94UAJW7M/07:52


?? The data is useful locally for SME on the plant floor, but for broader usage it needs another layer of contextualisation to make the information usable.

Infrastructure is on-premises for latency and other reasons.

?? https://overcast.fm/+w94UAJW7M/09:52


Where does real-time fit in manufacturing? Three tiers (in increasing order of sophistication)

1?? Know about a potential problem as soon as it occurs

2?? Use ML algo to get advice

3?? Use ML algo to directly take corrective steps (within guardrails)

?? https://overcast.fm/+w94UAJW7M/17:27


Arjun Narayan notes that Hadoop and NoSQL were a mistake for those companies that weren't Google or Amazon scale. Michael Stonebreaker was one of the few who called it out back then

?? https://overcast.fm/+w94UAJW7M/16:07


(here's the FUD-y stuff) "#apacheKafka and #apacheFlink are hard to use, you have to use Java, you can't use SQL."

?? ehm… Flink SQL?

"You don't have the operational ease"

?? ehm… managed services?

?? https://overcast.fm/+w94UAJW7M/24:21

(to be clear: cloud DW with SQL are easy, yes. But casting Flink and Kafka today as they were several years ago is simply ill-informed at best and duplicitous at worst)


?? "If you're trying to advance human understanding, batch is fine" (strategic analytics)

??If a human has to make a decision (e.g. on the plant floor), or we're trying to automate it - then use streaming.

?? https://overcast.fm/+w94UAJW7M/26:52

(I don't disagree with this as it's a useful framing for priorities in adoption and identifying early candidates for streaming projects. That said, once streaming really is as easy as batch, at that point batch becomes redundant)


Back to the FUD-y stuff:

Bespoke streaming pipelines are really complicated and take huge resources, if you're not Uber or Netflix then except for tier 1 use-cases like fraud detection you'll not be able to do it (unless you use Materialize, obvs ??)

?? https://overcast.fm/+w94UAJW7M/28:53


?? "The fact we're talking about streaming is a bug, it's an implementation detail. "

?? "You should Talk about streaming as often as you talk about B-Tree indexes"

?? https://overcast.fm/+w94UAJW7M/30:46


??Decomposing business calculations into streaming operators is painful.

????The people who write the stream operators are distant from the business domain. How do we get the technical tooling (e.g. SQL to the folk on the plant floor?

?? https://overcast.fm/+w94UAJW7M/32:34


?? So, overall this was an interesting podcast to listen to as it helps illustrate both where streaming fits, some of the challenges, and ways to reason about approaching it.

?? What I didn't enjoy so much was the vendor-heavy pitching against Flink in particular. Simply illustrating the difficulty of streaming adoption and use historically would have been fine without dunking inaccurately on other projects.

? But hey, I am probably biased given that I work for Decodable —who provide a managed #ApacheFlink and #Debezium service that gives you stream processing with SQL and no Java code ??

(although if you wanna bring your Java code, we can run that too)

#dataEngineering #streamProcessing

If you're interested in how real-time fits in manufacturing, you might like my talk at Open Source Data Summit next week. The use case is anomaly detection (with a ML model) of robots in a factory. Furthermore, I sidestep the FUD argument by solving it with Kafka and Python running in a managed service ??

要查看或添加评论,请登录

Robin Moffatt的更多文章

社区洞察

其他会员也浏览了