Data Pipelines, The Heart of AIoT

SpinDance

SpinDance architects builds and manages the software that powers today’s connected smart products.

发布日期: 2024年9月3日

Greetings from West Michigan! We’ve been dealing with an unseasonable heat wave around the Great Lakes, to the point where kids are staying home from school for “heat days”. That’s a far cry from the typical snow days we’ll be getting in January and February.?

And it’s not just the air that’s warm. The big lakes are mighty warm, too. Here’s a screenshot of the water temperatures around the Great Lakes, courtesy of Seagull. The temperatures in red are in the mid to upper 70° F. That’s the equivalent of hot tub weather for these big bodies of water.

Seagull is a data platform that aggregates real-time and historical data, and operated by the Great Lakes Observing System, otherwise known as GLOS. SpinDance helped build Seagull a few years ago, and it’s one of the team’s favorite projects. The GLOS team was a joy to work with, and it’s not often you get to work on such an impactful project that serves your backyard.?

Like all IoT-enabled data platforms, at the heart of Seagull is a robust data pipeline. It is the technology that captures and transforms data into actionable information. And it’s the focus of this issue of The Intelligent Device.

Data Pipelines: The Heart of AIoT

Data is central to Artificial Intelligence. We use data to train AI models. And we feed data through those models to make predictions. These predictions, in turn, are where the value of an AI model comes from, in the form of analysis and decision-making support. (For a recap, check out our previous issue about the CADA framework).

In a production setting, we use pipelines to manage our data at scale. In its most simple form, a data pipeline is composed of four steps:

Ingest: The data is ingested into the pipeline, either through a hardware sensor, or an Application Programming Interface (API).
Process: The data is processed. This typically involves validating the data against quality standards, as well as enriching it with additional data points. For example, we might add an identifier and timestamp to a reading. In this step, we might also downsample the data through aggregation and summarization.?
Store: The processed data is stored for later use. For so-called “cold data” we might store it for days, weeks, months or years. For real-time “hot data” we might only store it for seconds, minutes or hours.?
Delivery: Finally, the data is delivered to an upstream consumer. A consumer might be a human, or another digital system.?

In the Internet of Things, there are typically multiple pipelines working in concerts. For example, each device might act as a mini pipeline, and deliver their results to the cloud or on-prem data center:

“Edge” computing can add yet another layer. Each edge device accepts data from devices, and delivers refined data to the cloud.

Dana Gardner 3 年前

Altair Forward First – February 2024 Edition

Altair 7 个月前

Unraveling Big Data through an In-Depth Exploration of…

Rang Technologies Inc 2 个月前

Well-Architected Pipelines Save Money

At first, this architecture might seem silly: why not just send all the source data directly to the final destination? Aren’t we just increasing the overall cost of the system by adding so many intermediate steps??

The answer is a resounding “no”. Seemingly simple systems can? actually be more expensive. Here’s a real-world example to explain why:

About a decade ago, SpinDance inherited a first-generation IoT product that collected large amounts of environmental data. The devices were very simple, and collected temperature, humidity, and other data points every 20 seconds, and sent them immediately to the cloud.

With about 120,000 devices in the field, this worked out to around 360,000 readings a minute.

The thing was, the underlying data didn’t change that fast, and therefore didn’t necessitate sending the readings so frequently. A few times an hour would have been fine. And the cost of sending all that data was immense: it was a colossal waste of dollars in terms of bandwidth and compute.

We redesigned the second generation devices to be much more intelligent. Each device stored the data for 15 minutes, and sent an average of the readings on to the cloud. This reduced the number of messages sent to around 15,000 per minute, a 96% reduction in bandwidth and compute costs. In short order, these cost savings dwarfed the expense of the slightly more advanced devices.

Coming up: Connecting IoT Data to AI

In the long run, the true value of an AIoT system isn't just in the devices or the data but in how effectively that data is managed and utilized. Efficient pipelines not only cut costs but also enable faster, more accurate decision-making. Investing in robust pipeline design today is an investment in your product's success tomorrow. In our next issue, we’ll connect an IoT data pipeline to AI Models. If pipelines are the heart of IoT, models are its brain.?

And don’t forget: early September is a great time to visit us in Michigan, and take a dip in the Big Lake. It’s salt-free heaven!

Data Pipelines, The Heart of AIoT

SpinDance

SpinDance architects builds and manages the software that powers today’s connected smart products.

Data Pipelines: The Heart of AIoT

领英推荐

Well-Architected Pipelines Save Money

Coming up: Connecting IoT Data to AI

The Intelligent Device

409 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Revolutionize Your Data Game: The 10 Machine Learning Algorithms You Can't Afford to Ignore!

4 Ways Big Data and Machine Learning Transform Drug Discovery

Data Science, Predictive Analytics Main Developments in 2016 and Key Trends for 2017

You, the enterprise and AI - Part 2: Data Science vs Artificial Intelligence

The Incredible Data Hulk

Embracing the Power of Data Science and Digitalization: Future Trends

The Real-Time Data Revolution: Fueling AI for Manufacturing Excellence

Bill Schmarzo’s Top 2017 Big Data, Data Science and IOT Blogs

The evolution of slow data and fast data in recent years

5 Trends in Data Analytics in 2024 and Beyond

Data Pipelines: The Heart of AIoT

领英推荐

Well-Architected Pipelines Save Money

Coming up: Connecting IoT Data to AI

The Intelligent Device

409 位关注者

Location, Location, Location: Selecting Where to Run your AIoT Models

2024年10月7日

The CADA Framework, and a few Shrimp

2024年8月6日

AIoT, Defined

2024年7月2日

Stepping into the Future with Home APIs and GPT-4o

2024年5月29日

社区洞察

其他会员也浏览了

Revolutionize Your Data Game: The 10 Machine Learning Algorithms You Can't Afford to Ignore!

4 Ways Big Data and Machine Learning Transform Drug Discovery

Data Science, Predictive Analytics Main Developments in 2016 and Key Trends for 2017

You, the enterprise and AI - Part 2: Data Science vs Artificial Intelligence

The Incredible Data Hulk

Embracing the Power of Data Science and Digitalization: Future Trends

The Real-Time Data Revolution: Fueling AI for Manufacturing Excellence

Bill Schmarzo’s Top 2017 Big Data, Data Science and IOT Blogs

The evolution of slow data and fast data in recent years

5 Trends in Data Analytics in 2024 and Beyond