Prepare your Dataset for ParleIoT
ParleIoT is our groundbreaking data compression product designed to revolutionise how we connect and communicate in the Internet of Things era. This article will guide you through the steps required to test your dataset with the ParleIoT data compression using Python and Protocol Buffers (Protobuf).
Prepare
For this guide, we have selected a public dataset from an air quality sensor deployed in an Italian city.
(Released under CC BY 4.0, credit to Saverio Vito for publishing it)
Install requirements
In your terminal, install the Python dependencies for this demo:
pip install protobuf pandas ucimlrepo
Protobuf
To have an efficient and robust encoding, we demonstrate how Protobuf can be used to create constant-length outputs from measurement data. You first need to define your format and then convert it to code in your chosen language, Python, in our case. ParleIoT does not rely on you using Protobuf. Any other method that produces constant-length outputs also works.
Save the following code snippet to AirQuality.proto:
syntax = "proto3";
message AirQuality {
optional string date = 1;
optional string time = 2;
optional float co = 3;
optional sfixed32 pt08_s1 = 4;
optional sfixed32 nmhc = 5;
optional float c6h6 = 6;
optional sfixed32 pt08_s2_nmhc = 7;
optional sfixed32 nox = 8;
optional sfixed32 pt08_s3_nox = 9;
optional sfixed32 no2 = 10;
optional sfixed32 pt08_s4_no2 = 11;
optional sfixed32 pt08_s5_o3 = 12;
optional float t = 13;
optional float rh = 14;
optional float ah = 15;
}
If you do not know Protobuf, head over to ChatGPT, provide it with the headings and some lines of data from your CSV file and ask it to come up with a Protobuf definition. Some changes might be required, but this will get you started quickly. We can also assist if you have unique challenges or need additional support.
Using your terminal, convert the Protobuf file to Python code:
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
protoc AirQuality.proto --python_out=. --experimental_allow_proto3_optional
This will create a new Python file called AirQuality_pb2.py in your source directory.
Convert dataset
In this step, we create and store the following Python script to dataset.py:
import AirQuality_pb2 as AirQuality_pb2
from ucimlrepo import fetch_ucirepo
from datetime import datetime
df = fetch_ucirepo(id=360)['data']['original']
with open('dataset.txt', 'w', encoding="utf-8") as file:
i = 0
for index, row in df.iterrows():
data = AirQuality_pb2.AirQuality()
data.date = datetime.strptime(row['Date'], "%m/%d/%Y").strftime("%m/%d/%Y")
data.time = row['Time'].rjust(8, '0')
data.co = float(row['CO(GT)'])
data.pt08_s1 = int(row['PT08.S1(CO)'])
data.nmhc = int(row['NMHC(GT)'])
data.c6h6 = float(row['C6H6(GT)'])
data.pt08_s2_nmhc = int(row['PT08.S2(NMHC)'])
data.nox = int(row['NOx(GT)'])
data.pt08_s3_nox = int(row['PT08.S3(NOx)'])
data.no2 = int(row['NO2(GT)'])
data.pt08_s4_no2 = int(row['PT08.S4(NO2)'])
data.pt08_s5_o3 = int(row['PT08.S5(O3)'])
data.t = float(row['T'])
data.rh = float(row['RH'])
data.ah = float(row['AH'])
file.write(data.SerializeToString().hex() + '\n')
i += 1
if i == 2000:
break
This script will serialise the measurement data using Protobuf to the output file dataset.txt. We have put a break at 2000 frames because our website has a limitation set to prevent service degradation from excessive use.
In a terminal, run:
python3 dataset.py
Upload
Head over to ParleIoT and upload your text file: https://parleiot.com/
If you want to test with a larger dataset, please get in touch with us directly, and we will make that possible.
Tune
You can change the minimum required robustness level and the tracking period. The robustness level makes the decompression more stable in case of packet loss, and the tracking period can be tuned to optimise the compression performance. Stay tuned for more options to come.
Don't hesitate to contact us if you need help finding the optimal parameters for your application.
Evaluate
You can now see the compression results in the graph and start interacting with the parameter sliders to find the best performance for our use case.
The default setting for the tracking period is 10, but you can now experiment with different values and increase the minimum required robustness if needed.
With an increased robustness level and a larger tracking period, the compression performance decreases, but we can see less fluctuation in length between frames.
Now it is your turn! Convert your dataset and try out how much you can save!
Early Adopters
We are actively seeking early adopters to experience the power of ParleIoT firsthand. ??
?? Sign up here (https://lnkd.in/d-EztwH7), comment "Interested", or send us a direct message ([email protected]).