Building an IoT sensor to detect conditions for mold growth using Raspberry Pi Pico W & AWS IoT Core
Overview
My partner and I have purchased a new house (??) and we're anxious to move in. The house in question is a newly built properly that a few months ago was just a plot of ground. We recently had our final inspection with the builders and we were told that for the first 1-2 years there was risk of mold growth on the lower floors on account of it being a new build. Basically it takes time for the brick and mortar to fully dry out after being built, and during this process of drying the moisture can seep through the brickwork. However, not too much of a challenge we've just been told to vent the windows often to prevent this from happening.
I saw an opportunity for a project however, one where I could showcase some data engineering skills and build some services that monitored the temperature and humidity levels in the rooms of our house. Having a 'smart home' is nothing new to us and we already have smart lights which work with Alexa voice commands.
I've been playing around with Raspberry Pi's since 2013 when I got my first ever Raspberry Pi (with my first pay check from my first job). However more recently Raspberry Pi have released a Wifi enabled version of their microcontroller board the Raspberry Pi Pico W. The Pico W is an ideal low cost board (about £10) for small IoT projects.
Throughout the rest of this article I will walk through how I created this project from setting up and programming the Raspberry Pi Pico, to creating an MQTT broker in AWS, to creating databases within AWS and then making Python scripts to log data to it, sending push notifications to my phone when humidity and temperature levels are high enough to promote the growth of mold spores and also create a web based dashboard in node-red. The overall architecture of this project looks like this.
Initialising the Raspberry Pi Pico W
Whilst I normally use Visual Studio code as my preferred IDE for coding I decided to use Thonny for programming the Pico W. It is a lot simpler to transfer files between my computer and the Pico W using Thonny and it made the whole process a lot easier for me. I will put snippets of the code used in this article however if you want to check out the full repository then the link to my GitHub is at the bottom of this article.
Firstly I install MicroPython for this project and then import all the necessary packages, this can be done using the Thonny package manager which searches PyPi.org for packages, however some of these you can see in the list needed to be installed using mip (because this is MicroPython and it doesn't support the use of Pip). The DHT22 script is a local module specifically for the sensor (which I downloaded from GitHub) that is being used (a DHT22 AM203) this sensor detects humidity and temperature with a +- 0.5 degree accuracy. The downside is that it only samples data at 2 second intervals, so it is not the fastest. However, that won't matter for this project.
For the MQTT libraries I go into the Thonny terminal and first type mip.install("micropython-umqtt.robust"). When finished I then do the same for mip.install("micropython-umqtt.simple").
After the installs I program a wait time of 5 seconds. The reason for this is that I had some issues getting the program to run when not connected to my computer. The Pico W needs to be plugged into my USB port so I can send program data directly to it and I can run the code using the Thonny IDE. However, in application I won't want a computer tethered to each sensor and I'll just want the code to run automatically when the board receives power. This can be done only if the program you want to run automatically is called 'main.py'. I noticed however that I had some issues getting the board to run when connected to a mains power source, some helpful articles on Stack Overflow suggested that it's good practise to put a small pause in the code to allow the microcontroller to properly initialise (as it would have this time normally when connected to the computer), so I add a 5 second delay and haven't had any problems since.
I then initialise the inbuilt LED, which is particularly useful to leverage when we won't be having a screen to determine what the board is doing. I put a for loop together for a quick flash after the board turns on to tell me that it is going to try and connect to the WiFi.
I pass in my WiFi SSID & password to the variables necessary for use with the network library we installed, they're later called in wlan.connect(). The device then attempts to connect to the WiFi. We can handle a connection error using if/else logic and raise a run time error if the wifi connection failed. Again this is difficult to know without an IDE to tell us what is happening. So in the else statement I trigger led.on() so we know that if the device is connected to the internet the LED will be on.
Next we need to connect to our MQTT Broker (which hasn't been made yet). However, with AWS we need to use SSL (secure socket layer) authentication. This is as an alternative to using a username and password. Basically, within the AWS console we can download three certificates which are used to authorise access to the broker. We download these and store them on the Pico W for authentication. We will need to convert the .pem files to .der files.
Next we need to connect to our MQTT broker.
Creating an MQTT Broker with AWS IoT Core
To correctly configure the SSL parameters mentioned above we need to create a new policy within the AWS console. We go into our AWS IoT core module and select Policies form the Security section. From there we click 'Create policy'. What we are doing here is essentially configuring the 'permissions' that our IoT connected device will have with relation to the broker. We give our policy a name, I called mine pico_iot.
After we have created this policy we go into the certificates menu.
We click 'Add Certificate' and are presented with the following menu.
We need to download the first three of these for our Pico W but I will download all of them, as the root CA files will be needed in a later part of the project. Now we are ready to create our IoT 'things' within the console.
We go to Manage and then Things, and then click on 'Create things'.
To create a new device (or 'Thing') we would click on the Create things button. We have the option to create a single 'thing' or 'many things' for this I chose single, no device shadow and skip generating certificate. I named my device 'Living_Room_Pico' which you can see in the above image.
Now we go to the certificates menu and select the certificate we just created. In this menu we click on 'attach policies' and then we attach the policy we just created (pico_iot) like so.
Next we move to the 'Things' tab in the certificates menu and click 'Attach to things'. This shows a drop down menu where we select the thing we want to attach our certificate to, mine is shown here as Living_Room_Pico.
We now need to use the openssl command in a terminal to convert our .pem certificates to .der as mentioned earlier.
We put our .der files onto our pico now for authentication.
Now let's head back to our Thonny IDE to finish writing the MicroPython code for the sensor.
We create a function for connecting to our new MQTT broker. We specify the client ID as "Living_Room_Pico" but we'll need to change this when we add more than one device to the system. Our server is the endpoint which we can find in our AWS console. We set SSL to True and call our earlier defined function. This function now uses the .der certificates we created to authorise access to the MQTT broker.
Next we need to set up our publish function to allow us to publish topics to the MQTT broker. The print statements are to help us with debugging, but essentially all we are doing is using the client function from our MQTTClient library and passing it a topic and a string value.
Now we just need to configure our hardware on the Raspberry Pi Pico W. We initialise the i2c pinout on the board and set the correct pin out that our sensors data line is physically connected to, using the DHT22 library.
Now for the final part of our code, we create a while loop. Within it we initialise our temperature (T) and humidity (H) variables as read from the DHT22 sensor.
We then prepare the message value that we publish via our client using our new publish function. Importantly we specify here our MQTT topic (namely 'pico/LivingRoom/TempHumidity') which is specific to each sensor. Lastly, we publish the measurements variable which is the actual data we need.
After each publish, I set a quick LED blink to let me know when a new topic has been published, again this helps when the device isn't connected to an IDE.
Lastly we can test that we are receiving data packets in the AWS IoT Core console. Using the topic which we specified here we go to MQTT Test and type in our MQTT topic. We can now see that the broker is receiving our data.
Logging MQTT Data to a SQL Database & Setting up an AWS RDS instance
We've successfully started receiving data to our MQTT broker in AWS. However I want to actually be able to do something with this data so I really want to be saving the data values that have been sent. I won't need this for the dashboard as it can just give a real time update of the data as it is received but for other outputs of the project I need to be able to look at a larger dataset and make some queries. So what I need to look at now is creating a SQL server and again I'll be hosting this with AWS.
Within the AWS console I'll search for 'RDS' which is Amazon's relational database service. It allows me to create a database in the cloud that I can then later send data to with Python.
Within the RDS console I click 'Create database' and then select 'Easy create'. I now have the choice of what flavour of SQL I would like to create. For this one I will choose PostgreSQL.
Next I'll make sure I choose the cheapest version of the service that is within the AWS free tier. I'll also set an identifier for the database I am about to create and create the login detail to access the database later.
Finally I click on create database to start the process.
Now I can go into RDS > Databases and find the database I just created with its' unique identifier 'mqtt_read'. When I click on the identifier I can get all the information I need to connect my database tool (in my case DBeaver) to this SQL instance.
Now, within DBeaver I can go and connect to this database using the endpoint, port and other login details.
Our database is now set up and ready to write data to. So we can go ahead and create our next Python script. I called this one DataLogger.py
Like with any script the first thing I need to do is install the necessary Python libraries. I'm using paho to subscribe to our MQTT topic, pandas for data wrangling and logging data to our new SQL database and SQLAlchemy to connect to our new database. Passwords is a local python module I created that I put the login details as variables for everything that will be used in this script. The reason for this is that I can add this file to my .gitignore to prevent me from putting any passwords in a public GitHub repository.
The first part of the script connects to our new SQL database using SQLAlchemy, the most important thing here is that you put the RDS endpoint as the host variable as a string. Also you will need to pip install (I use pipenv) pg8000 to get the necessary drivers installed to log data to the database.
We create a function called on_connect() which subscribes to our MQTT broker. We call the client.subscribe function and pass it the topic string which we set inside our Pico W. You will notice however that in the Pico W our topic string was 'picow/LivingRoom/TempHumidity' and here we are using the topic string 'picow/+/TempHumidity'. The difference is that inside this code we are using an MQTT wildcard (the +). This means it will subscribe to any topic of the same form regardless of what that middle value is. This is really helpful as when I later add sensors for other rooms such as Kitchen and Bedroom this python script will be able to accept all kinds of message traffic.
Next we set up the on_message function. This defines the behaviour of the script when a message is received. The messages come in as a JSON object however we can parse this as a string to make things easier. Firstly as the data is always being sent accurate to two decimal places I can use pandas slice notation to split the string exactly where I want it. So to retrieve the value of humidity I slice from the 13th position to the 17th position, this returns 5 characters (for example 54.32), we can do the same for temperature as well. To get the room topic is a little different as this value is always the same with each data packet. Instead of intercepting msg.payload we access the topic. The topic of course is what we specified earlier as 'picow/+/TempHumidity'. Now using the python split function we specify how we want to split our data, I choose the '/' symbol. This returns an array of length 3 with each index a string. I can then access the room of the received data packet by selecting the index of the returned array I want, which here I do with topicSplit[1] which returns the second index which is the room that the data has been sent from.
Now with these string values I create a dictionary which I need to do in order to create a pandas dataframe. I create my first field as 'LoggedTime' and call date time.now() to give me the current time that the data is logged to the database. I set all other key value pairs to previously created variables before passing this dictionary into pandas DataFrame function.
Before we send this data to SQL however we need to cast the string values from our dataframe as floats which we do by selecting the column of our dataframe (which is named according to how we specified it in the dictionary) and then using .astype(float).
Finally we can call the pandas .to_sql command. We first pass in the name of the table we want to send the data to then specify the connection we're using by using the con variable we created at the beginning of the script. Importantly we need to set if_exists='append' else pandas will continually overwrite the old data with new data, we set index to false as we've already specified this in our dictionary.
Everything we've done in the script thus far is just creating functions, though we haven't called any of those functions yet so our code won't do anything. The next step is connecting to our MQTT broker.
What we are doing here is essentially just telling our script where the MQTT broker us (in client.connect()) and passing it the necessary security certificates to allow it to subscribe. Recall when we downloaded our certificates and keys earlier, we now need the root_ca file, the certfile and the keyhole. These are all saved in a folder called certs (which I've also added to my .gitignore). We set client.loop_foreverI() to trigger the code to loop and wait for incoming data packets. If we run the code now and check DBeaver we will see that we are receiving data into our database.
Sending push notification warnings using Webhooks
To recap, we have a sensor publishing temperature and humidity data to an MQTT broker hosted by AWS. We are now receiving those packets of data as a JSON and parsing them into a pandas dataframe and then logging that data to an Amazon RDS PostgreSQL database.
When I started this project there were three end to end solutions that I had envisioned, the first was a simple dashboard that would show me what the temperature and humidity in each room was, pretty simple. The second was a service that would send a push notification to my phone which would tell me that the humidity and temperature levels are at a level that is conducive to promoting mold growth. Hypothetically when I receive this notification on my phone it alerts me to open a window or turn on a dehumidifier. The last service I imagined was one that recorded all the historic data and produced a report that could be emailed to myself periodically with a trend analysis. Such as letting me know that the most humid times of day are between certain hours which could prompt me to vent the house a bit more at certain times of day. This latter solution however requires me to build up a record of data to actually produce the report from, so I will save that output for a later project and focus on implementing the first two solutions.
领英推荐
To send a push notification to my phone can be a tricky task and I didn't want to create my own iOS app to do it. Fortunately there is a simpler way using IFTTT (https://ifttt.com/ ). This is a service that allows you to connect smart devices and built useful applets. I've used webhooks in the past to trigger our smart lights to turn on when walking past a PIR sensor. I will use webhooks again for this project, all it requires is that I have the IFTTT app installed on my phone.
On ifttt.com I type webhooks into explore and select the correct tile. I then click on create.
I'm presenting with the following logic which allows me to specific an if condition and then if that is triggered specify what will happen as a result. So first I click on if this > add. We type in webhooks and select it again. We'll then select the 'receive a web request' tile.
Then we specify the name for our trigger. I've called this one 'Notify'. Now we need to add what happens when our event is triggered. I typed in and chose the service 'Notifications'.
Within the notifications menu I chose send a simple notification using the IFTTT app. I can then set the message I want to receive on my phone as follows. The {{Value1}} in this notification will be later assigned to our room variable so we will get a trigger telling us which room needs attention.
The web hook is now complete and ready to use. If I go into the webhooks documentation I can get a curl command that I can later call as an output in my new python script.
Now to get started with my push notification script. As before I import all the necessary libraries that I need and connect to the MQTTRead database we set up.
Next I initialise a while loop and use the pandas .read_sql function. This is a really useful function that allows me to use standard SQL syntax to query a database and return that data as a dataframe. Within the query I am taking the average temperature and average humidity in the MQTTRead database from the past hour and returning a dataframe. Importantly I am using GROUP BY to group by rooms, we will later be iterating over the dataframe so grouping in this way means we get both averages for each room that we have.
Next I call my for loop and iterate over each row in the dataframe. I then assign temp, humidity and room variables to each returned aggregate value. This means that because we are iterating over the rows of the dataframe, if I had three sensors (say for living room, kitchen and bedroom) then I would be returned with average humidity and temperature values for each room.
Next we will call our conditional logic in a series of elif statements. Now I am aware that you can have a hot room with a low humidity and this won't promote mold growth, and I'm also aware you can have a humid room with a low temperature and this won't necessarily promote mold either. I did however find a useful guide online for telling me at what temperature and humidity levels mold growth is likely to occur.
In the above chart I converted the temperatures to centigrade and then took the line underneath the green line as the threshold humidity needed at each temperature to promote mold growth. I then converted this into a series of elif statements as follows.
So, now if any of the following conditions are met, my webhook is triggered and I get a notification on my phone telling me which room needs attention (notice that I used an f string to replace the dictionary value for the value1 key pair with the room variable.
Cleaning up the MQTT Table and Aggregating data to another table
Now we have a complete end to end solution with our data. But there is a problem. Firstly I want to be able to analyse data trends in a future project and I want to store the average temperatures and humidities over the past hour or so in a separate table. But also the MQTTRead table fills up pretty quickly. Actually once data has been added to the database, if a web hook has not been triggered and the data isn't going to be used again (because say we're only looking in the past hours worth of data) then this data no longer serves a purpose and we don't want the data base to fill up unnecessarily. For this reason I created the DataAggregator.py script which takes average data readings in the past 30 minute interval and stores it to a new table. Then we use SQLAlchemy to delete data from the MQTTRead table that is older than one day, this helps to keep the table from filling up.
Like the other scripts we first load in the necessary packages and then connect to our database using SQLAlchemy.
Next we initialise a while loop to keep the service running. Like in our previous script we use the pandas .read_sql function to read a SQL query and return a dataframe to us. This time we're creating a query that returns summary statistics. That is the count, min, max, average, and standard deviation of both humidity and temperature measurements for each room within the past 30 minute time period and again GROUP BY room. Like before we use the df.sql function to write this to our new database which I've called 'TrendAnalysis'.
This returns the following data to our database.
Now all we need to do is clean up the old data from our MQTTRead table. This was a little tricky to learn how to do, as our pandas functions simply read data its a much simpler operation. However after reading through a lot of stack overflow articles I found the following method worked well, using the SQLAlchemy.text function to use the SQL DELETE FROM together with RETURNING command as follows.
This command deletes data from the previous day in our MQTTRead table and then sleeps for 30 minutes (1800 seconds).
Creating a dashboard using Node-Red
The final part of our project now (apart from hosting it with AWS) is to create a dashboard. I'm a fan of gauge type dashboards and I wanted to take the opportunity to familiarise myself more with node-red which I've only used once or twice before. Node-red is a drag and drop no-code platform that can run in a web browser. The good news is we can use it with simply the incoming MQTT data we subscribe to, so it is completely independent of our databases.
I add an MQTT input node into the flow, and configure it usiny the previously downloaded certificates from AWS (including the CA one), and set it to subscribe as previously to 'picow/+/TempHumidity'. This ensures all our messages are being captured in this flow.
Next, I add a function block to clean up the data and combine the original payload with the room from the topic into a single JSON object. With this I can now add a switch statement that splits the messages into 3 output nodes, for the living room, kitchen and bedroom.
The last block of nodes uses the gauge from the node-red-dashboard (which needs to be installed separately from the main menu, then going to the palette manager). For each of the switch outputs I add a gauge for the temperature and the humidity values. Within the gauge, I assign each to a group per room and also set suitable limits (0-50 for temperature and 0-100 for humidity) as well as warning colour ranges.
This is then repeated for humidity and then for each other room. At the time of writing I don't have the other sensors to hand (they've been ordered) but as soon as I configure them in Thonny they will appear on the dashboard. The completed flow looks as follows.
We can now deploy the dashboard. This hosts a local website which can be accessed with the localhost command in the browser. If I add /ui to the end of the url I get the designed dashboard which looks like this.
Creating a Docker Image for the Python Services
I've now finished created all my services, we have a dashboard, a push notification service and a data aggregator. However, I'm not going to keep vs code or a terminal running to keep these services active. No I want to deploy them in AWS and have them run continuously.
A convenient way to deploy services is to bundle them up into a Docker image, which includes the runtime binaries (Python) as well as any necessary data (scripts, certificates, passwords etc.).
I create a Dockerfile for each service (DataLogger, DataAggregator and PushNotifications), all of which look identical except for the last CMD command. I use a multi-stage image, starting with a python v3.11 image as base.
In the first stage, I install any build dependencies (pipenv), copy the Pipfile and the Pipfile.lock and install a virtual environment within the Docker image.
In the second stage, I copy over the virtual environment from the previous stage and add it to the path so it can be used. I also add a new user to prevent root access to the image, keeping it secured. Finally, I add any necessary files, for simplicity I copy in both the scripts and the certificates here.
The last statement tells the image which command to use when running a new Docker container.
The dashboard service is slightly different and a lot shorter. In this case, I simply use the node-red official base image and copy over the node-red project data into the /data folder, which is where the image expects to find it. This includes any flows and uploaded certificates from when I created the node-red dashboard by hand using the web interface.
Lastly for all the Dockerfiles I build the images and push them to the AWS container registry, so that I can use them later when deploying them.
Creating an AWS EC2 instance for running the services
Now that I have all my services bundled into separate Docker images, I need a virtual machine on the cloud to deploy them. For this I choose AWS EC2. EC2 is Amazon's elastic compute cloud. It basically deploys a virtual machine in the cloud for us.
From the EC2 page, I navigate to instances and click the Launch instances button.
I enter a name for the new instance 'mold-detector' and for simplicity choose the quick start workflow. I choose the Amazon Linux OS for this project.
To keep costs low I choose the micro size instance, which has an eligible free tier. Since the project workload is small, this will be plenty to run all the services.
In the network settings I make sure that SSH traffic is allowed (so that I can connect to it, install packages and run the services) and also http traffic is allowed (so that I can connect to the node-red web interface).
With this, the EC2 instance is configured and after a few minutes it is ready to use. Since I want to use Docker to run the services, first I install it following the steps in the official documentation https://docs.docker.com/engine/install/ .
Once Docker is ready, it is much easier to orchestrate the deployment of multiple services using a docker-compose.yaml file. Here, I make an entry for each service, where the most important is to specify the image. These images are based in the previous Dockerfiles. I also specify that the service should restart if it fails unless I stop it. For the data-logger I specify the client ID in the environment, so that it doesn't clash with any other connections to MQTT. For the dashboard I also map the http port (80) to the container so that the web interface is available from the instance.
Finally, I can start all the services by running: docker-compose up -d
Our services are now deployed! The dashboard is accessible via a bookmarked url and I'll get a push notification if the humidity levels get high enough to risk mold growth in our new house.
Conclusion
Thank you for taking the time to read about this project. It took a lot of work to put together but I really enjoy building these kind of data engineering style projects. Overall I spent about a week getting everything together and finally deployed it yesterday. I learnt a lot about AWS doing this project and I really like the hands on learning approach. In the not too distant future I plan on doing some more work with this product with the TrendAnalysis table. In this project I:
I hope you enjoyed reading about this project, for more please check out my personal portfolio below.
Many thanks
James