59 Machine learning v curiosity
Martin Osborne
Water industry strategic advisor, asset planner and drainage expert Winner of the 2023 WaPUG Prize for contributions to the development of urban drainage practice
The image above was used by Ofwat to represent innovation in the UK water industry.?If you look carefully you will notice that none of the gears mesh together properly so the machine won’t work.?Maybe not what they intended.?
I am finally getting round to reading the book “A Strategic Digital Transformation for the Water Industry”, published last year by the International Water Association.?It is available at this link.?Well done to the editors Oliver Grievson, Timothy Holloway and Bruce Johnson for putting together such a comprehensive review of what is and should be being done to transform the water industry.?If anything it is too comprehensive at 119 pages and I am still ploughing my way through it.?However, one figure jumped out at me and linked in with some work that I did recently, hence this blog.?There may be more blogs on other ideas from the book in the future.
Analysing treatment works flow records
Back in Episode 22 of the blog I talked about analysing compliance with required Flow to Full Treatment (FFT) and referenced a good article in The Guardian on a citizen science approach to this.?The IWA book showed another example (see the figure below).
The book states: “Figure 2.5 shows four clear peaks which grow over a period of three years showing the gradually worsening infiltration into the sewer environment.”
This is true, but there is actually a lot of non-artificial intelligence in extracting that information from the graph and even then we are making some significant assumptions to state that the cause is worsening infiltration.?Even if it is, we do not know if it is due to deteriorating condition of the sewers or climate change increasing the level of the water table.
To be fair to the editors, they do say that this is as simple example of the use of data rather than an example of digital transformation and qualify the assumptions as; “With additional information, such as rainfall, geology, and the performance of the system, this can be used to predict where the source of the problems within the sewer network is.”?
We could perhaps use machine learning or artificial intelligence to provide those insights for us.?Or can we just use human curiosity?
I recently did some analysis of treatment works flow records that provided some additional insight into what is going on, but using some simple data hacks rather than complex data science.
The data
I started with five years of daily flow data, similar to that shown above.?I also had daily rainfall depths for the same period.?It was a bit less obvious what the patterns were as you can see from the graph below.
I found that there was too much noise in the daily data to easily make sense of it, so I did my investigations by aggregating to monthly data.
Baseflow
The first thing that I looked at was the average dry weather flow through the treatment works.?The official definition of this in the UK is the daily flow that is exceeded for 80% of the time.?That is, we ignore the 20% lowest values as they may be due to monitor error or operational issues, but we take the lowest of the rest of the values as the dry weather flow.?This is a really useful definition as it doesn’t actually involve proving that it is dry weather by matching up rainfall data.
This calculation gave a DWF of 93.5.
Seasonal variation
A trick that I have used for a long time to help understand infiltration into sewerage systems is to do this same calculation for each month rather than over the whole year.?This shows if there is a seasonal variation in baseflow.?If there is no variation then the average of the monthly values is the same as the annual value.?If there is seasonal variation then the average of the monthly values is higher than the annual value.?
The results for this analysis are shown in the graph below compared to the straight line value from the annual analysis.?This shows a large seasonal difference in baseflow suggesting that the catchment suffers from a lot of infiltration in the wet months of the year.
领英推荐
I then assume that the lowest monthly value of baseflow is the true base wastewater flow and that the rest is infiltration.?This gives the graph below.
Beware that this assumption that the lowest baseflow value is the true wastewater flow may not be correct.?There could be an operational issue affecting the month with the lowest value, although in this example there are other months with very similar values so that is unlikely.?Also those dry months could be exhibiting exfiltration with some of the wastewater leaking out of the sewers (see Episode 55).?I will ignore that issue for now as it is almost impossible to take it into account.
Peak flow – slow response
The official definition of the peak flow that a wastewater treatment works has to be able to process is based on a completely different definition of flow in dry weather.?This is the maximum recorded flow on a dry day (less than 0.25 mm of rainfall) following another dry day.?(It would be so much easier if this also used a percentile definition but with a much higher percentile than for the DWF.)
As I do have the rainfall data for this location I can calculate this maximum dry day flow and add it to the baseflow.?This is shown as green in the graph below.?This is not direct runoff as it occurs on a dry day, but it is not base infiltration as it is not included in the baseflow.?This is slow response to rainfall on previous days.
Peak flow - fast runoff
The final component is to show the maximum flow through the works including wet days.?This gives the pink area in the graph below.?This does not add very much to the peak flow.?This is partly because this catchment is notionally largely separate with limited areas contributing runoff and also perhaps because there may be overflows and storm tanks lopping off the short term peak response to rainfall.
The overall results give a snapshot of the response of the system.?
Foul flow????????????????????35%
Seasonal infiltration???35%????Significant issue
Slow response???????????15%????Less of a problem than infiltration
Fast response????????????15%????Largely separate system
Conclusions
The results of the citizen science project reported by The Guardian and the analysis that I have set out here show that there is a lot that we can understand about the operation of our drainage and wastewater systems by relatively simple analysis of existing data.
Neither of these used machine learning or artificial intelligence, just non-artificial curiosity.?Can we give water company staff the time and the incentives to be more curious?
Associate Director at AtkinsRéalis
1 年George Walter Clapp - this maybe of interest
turning data into actionable insight
1 年Domain specific knowledge is hard won and invaluable.
Co-owner and developer at Meteor Communications (Europe) Ltd
1 年Just for context, the latest Deep Learning Neural Network that I have built (on my budget) has 123 million 'Neurons', the human brain is thought to have 86 billion Neurons, plus Neuroscience is still rapidly developing and feeding into the development of Data Science. Curious humans will be required for a long time yet, at least until Quantum computers become mainstream.
Trade effluent consultancy
1 年As someone who has a long history with "AI" in the water industry I can confirm that there's a lot of guff talked about the subject. Martin's blog hits the nail on the head. When I first started using neural networks (1993), I was told that 75% of the effort in developing them was basic data analysis, 15% was setting up the software (it was much more laborious then), 5% developing the models and 5% testing them. That's still more or less true. Many times I go to develop a model and find the answer in the basic data analysis. The term "AI" first fell out of fashion in the 1960s when everyone realised the models simply had no intelligence. That killed research into neural networks for many, many years. We're going through the same hype cycle now but this time the models are more complex and the hype will probably (hopefully) regulate the "AI" world rather than kill it. There's always been a tendency to throw a machine learning model at some data and hope for the best but the really important bit is that last 5% - testing the models to make sure they can generalise accurately. If they can't then you have a useless model, which brings us back to making sure the basic data analysis is correct in the first place.