A codeless and production ready IoT Platform on AWS
Introduction
The use of advanced analytics and machine learning techniques can help to answer to different business domains if we have the right set of data. In this scenario, all the topics related to data collection and data organization play a crucial role. Internet of things (IoT) technology, today largely diffused, allows to collect data from different sources with high throughput. There are many businesses challenge that can be addressed considering the utilization of IoT technologies, including predictive maintenance tasks, remote patients monitoring systems, collision detection systems for vehicles, etc.
In this article, I will examine an approach to build an end to end codeless and production ready IoT platform, leveraging AWS technologies.
Data acquisition from devices
First service I’ll talk about is AWS IoT Core that represents the first point of contact between IoT devices and our cloud platform. AWS IoT Core allows you to configure a secure communication between the edge (devices) and the cloud (AWS). It supports different communication protocols, including HTTP and MQTT, and it is important to notice that many devices’ providers also support AWS IoT Core natively.
To start collecting data, first task is to configure new thing in AWS IoT Core, gives it a name, and generate a certificate (directly from AWS IoT Core console) to enable secure communication between your device (or gateway) and your platform. The screen below reports the device creation page:
Once you created your thing object, you must configure the connection from your physical IoT device to your AWS IoT Core instance. This can be done configuring the AWS IoT Core endpoint in your device’s configuration page. AWS IoT Core endpoint can be identified from AWS IoT Core setting page.
To verify your data flow between, considering you are using MQTT protocol to connect your devices, you can use MQTT test client features, indicating the topic you are sending data to and all realtime data will be available from AWS IoT Console. This is useful just to check devices and IoT platform connection.
Considering that, as discussed in introduction section, one of the high values of IoT platforms is data they allow to collect, next crucial step data forwarding to persistence service, in our case Amazon S3.?
Data forwarding is a feature of AWS IoT Core, that allows you to send real-time data to persistence services. This is a fundamental step because here you can start to manipulate your data, preparing them for the storage layer. For example, for real IoT use cases, it is important to organize data in partitions (for example, year-month-day), to optimize queries on data or other analysis. Consider the case where your input data only contains a filed named “timestamp”, you can split this filed in three value, year, month, and date, to simplify partitioning on Amazon S3. This can be done, from AWS IoT core, using SQL statement as explained below:
Enter a SQL statement using the following: SELECT <Attribute> FROM <Topic Filter> WHERE <Condition>. For example: SELECT temperature FROM 'iot/topic' WHERE temperature > 50
An AWS IoT forwarding rules can include actions that specify where and how your filtered data must be forward.
Data storage
We are ready to go deep to data persistence part of our reference architecture, see image below:
领英推荐
Before to put data into Amazon S3, the architecture includes two components of Amazon Kinesis service: Kinesis Data Streams and Kinesis Delivery Streams. Below the description for both Kinesis data stream and Kinesis delivery stream:
In described architecture, IoT forwarding rules must be configured with an action that forward filtered data to our Kinesis data streams. Each data is forward to Kinesis Data Stream and persisted for the specified stream for configured retention time.
At this point, last layer before Amazon S3 is Kinesis Delivery Streams. This layer allows you to specify a source, a transformation, and a destination. In our architecture, source is Kinesis data stream, we have no transformation, and destination is Amazon S3. This step explains why we need to introduce these layers in our codeless IoT architecture: destination settings can include dynamic partitioning and data aggregation configuration.
Using dynamic partitioning, you can define how your real-time data must be organized in Amazon S3 bucket, for example, splitting them into year, month, and date “folders”. Also, it is important to aggregate data in blocks of 128MB before to put them to S3 with the objective to reduce Amazon S3 write costs and to optimize data querying operations.
After you completed these configurations, your data will arrive to Amazon S3, partitioned and aggregated.
Data analysis and visualization
One of common task is to analyze collected data using SQL language. AWS allows you to query data on Amazon S3 bucket using Athena service. To do this, you need to create a table on top of your Amazon S3 data. Before to create a table, you can create new database using AWS Glue Databases feature.
Once you have your database, you can create your table directly from Athena service, after you select your new database. Below an example of create table query:
CREATE EXTERNAL TABLE myTable
`timestamp` timestamp,
devicename STRING,
temperature FLOAT
)
PARTITIONED BY (year STRING, month STRING, day STRING)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION "s3://{YOUR_BUCKET_HERE}";(
?
Bioengineer, PhD candidate
2 年Complimenti Simone