Integrating IoT with Snowflake for Time Series Anomaly Detection
Jan-Niklas Pahl
Driving AI and ML Innovation | Leading Advanced Analytics with Azure, Snowflake, Grafana | Proven Track Record in Digital Manufacturing Excellence and Generative AI Solutions
"We aim to test anomaly detection in time series data, using vibrations monitored by IoT sensors on the shop floor. This practical demonstration will show a streamlined method to integrate IoT data into Snowflake, meeting stringent business requirements with efficiency and without the need for extensive team involvement."
1. Data Capturing
Initially, selecting an appropriate tool to capture IoT data on a local host is crucial. There are numerous established tools available, some of which are open-source:
- Node-RED - (https://nodered.org)
- Crosser - (https://crosser.io/)
- OPC Router - (https://www.opc-router.de)
- DataFEED OPC Suite Extended - (https://industrial.softing.com/datafeed/opc)
- ACCON-EasyLog - (https://www.deltalogic.de/ACCON-EasyLog/130001WUX)
- Cogent DataHub - (https://cogentdatahub.com/features/integration/database-write/)
- Kepware by PTC - (https://www.ptc.com/de/products/kepware)
These tools enable effortless connections to your IoT devices for recording necessary data tags or events. In our example, data is stored in CSV files, with a new file generated every 10 minutes.
2. Data Movement
The data is then transferred to Azure Blob Storage, a scalable, secure, and cost-effective Microsoft Azure service.
Data is moved from on-premises to the cloud using AzCopy, a Microsoft command-line tool that efficiently transfers data to Azure Blob Storage on an hourly basis.
For an introduction to AzCopy, visit [this guide](https://talibilat.medium.com/a-step-by-step-guide-to-using-azcopy-59a50d2bcfba).
3. Data Integration
The collected data is inserted into a Snowflake table using the COPY INTO command, which can handle large volumes of files from Azure Blob Storage and tracks file modifications to prevent duplication.
Here is a cool MEDIUM article about this topic:
4. Anomaly Detection
Now for the exciting part—we utilize built-in anomaly detection ML models in Snowflake to identify unusual patterns in our time series data. The table structure and example code for creating and applying the model are as follows:
领英推荐
Code Example for Training an Anomaly Detection Model:
CREATE OR REPLACE SNOWFLAKE.ML.ANOMALY_DETECTION PRED_MAINT.HARMONIZED.VIBRATION_ANOMALY_DETECTOR(
INPUT_DATA => SYSTEM$REFERENCE('VIEW', 'PRED_MAINT.PREPARATION.VIBRATION'),
TIMESTAMP_COLNAME => 'ts',
TARGET_COLNAME => 'measurement_vibration'
);
Code Example for Model Inference:
CALL PRED_MAINT.HARMONIZED.VIBRATION_DETECTOR!DETECT_ANOMALIES(
INPUT_DATA => SYSTEM$REFERENCE('VIEW', 'PRED_MAINT.RAW_STREAM.VIBRATION'),
TIMESTAMP_COLNAME => 'ts',
TARGET_COLNAME => 'measurement_vibration',
CONFIG_OBJECT => {'prediction_interval':0.80}
);
Below is a visualization showing abnormal areas detected in the vibration data.
FAQs:
Q: Why not write directly into Snowflake via ODBC/JDBC drivers?
A1: Cost Efficiency: Utilizing a Snowflake Warehouse for continuous operations can be expensive, especially if it needs to run 24/7. Our chosen method reduces costs by minimizing the time the warehouse needs to be active.
A2: Reliability: Direct connections are susceptible to outages from proxies, internet providers, or Snowflake itself. Using batch processes for data transfer to cloud storage is generally more reliable and ensures data isn’t lost during outages.
A3: Storage Optimization: Snowflake uses immutable micropartitions for data storage, where each INSERT creates a new micropartition. This can be less efficient for cloud storage architectures compared to batching data into fewer, larger updates. Note that newer Hybrid-Tables in Snowflake are designed to handle such operations differently.
Q: Can this setup handle real-time data processing?
- A: While the current setup is optimized for batch processing and "no datalos", Snowflake and Azure both support real-time data processing capabilities. Real-time data streams can be managed using additional tools like Azure Stream Analytics or Snowflake’s Snowpipe, which allows for near-real-time loading of streaming data.
Q: What are the scalability limits of this IoT integration with Snowflake?
- A: Snowflake's architecture allows for near-infinite scalability, both vertically and horizontally, without downtime. You can scale up computing resources during high-demand periods and scale down during quieter periods, which is ideal for handling varying loads of IoT data.
Q: How does anomaly detection in Snowflake help improve operational efficiency?
- A: Anomaly detection models in Snowflake can identify unexpected patterns and potential issues early, reducing downtime and maintenance costs. This proactive approach allows businesses to optimize operations and prevent costly disruptions by addressing issues before they escalate.
Q: Are there any specific types of IoT data that are more challenging to integrate?
- A: High-velocity and high-volume data streams, such as those from sensors operating at high frequencies, can pose integration challenges.
Q: Can the data integration process be automated?
- A: Yes, the data integration process can be fully automated using scripts for AzCopy and scheduling features in Snowflake. Automation tools can be set up to handle data transfer, loading, and processing routines without manual intervention, ensuring a seamless and consistent data flow.
Q: What types of analytics can be performed on the IoT data once it's in Snowflake?
- A: Once data is in Snowflake, you can perform a wide range of analytics, from basic descriptive analytics to more complex machine learning models. Snowflake supports various data science and machine learning tools and frameworks, allowing you to derive actionable insights and forecasts from your IoT data.
Hi Jan, good to see how easy it is to onboard new use cases. I'm proud that we support you on the snowflake journey from the beginning. Best Matthias
??Mobilize the World’s Data??
5 个月Great use case! Looking forward meeting you in Berlin!
Manufacturing Account Executive | Data & AI Sales Leader
5 个月Great post ????
Co-Founder & Product Owner at Latenode.com & Debexpert.com. Revolutionizing automation with low-code and AI
5 个月Great post, Jan-Niklas! Your approach to integrating IoT data into Snowflake for efficient anomaly detection sounds innovative and robust. ?? It's impressive how you've kept the setup lean and low-maintenance. If you're looking to optimize data flows further, you might find Latenode's advanced customization options with full JavaScript and NPM package support helpful. Keep up the great work!