登录查看更多内容

59 Machine learning v curiosity

Martin Osborne

Water industry strategic advisor, asset planner and drainage expert Winner of the 2023 WaPUG Prize for contributions to the development of urban drainage practice

发布日期: 2023年7月4日

The image above was used by Ofwat to represent innovation in the UK water industry.?If you look carefully you will notice that none of the gears mesh together properly so the machine won’t work.?Maybe not what they intended.?

I am finally getting round to reading the book “A Strategic Digital Transformation for the Water Industry”, published last year by the International Water Association.?It is available at this link.?Well done to the editors Oliver Grievson, Timothy Holloway and Bruce Johnson for putting together such a comprehensive review of what is and should be being done to transform the water industry.?If anything it is too comprehensive at 119 pages and I am still ploughing my way through it.?However, one figure jumped out at me and linked in with some work that I did recently, hence this blog.?There may be more blogs on other ideas from the book in the future.

Analysing treatment works flow records

Back in Episode 22 of the blog I talked about analysing compliance with required Flow to Full Treatment (FFT) and referenced a good article in The Guardian on a citizen science approach to this.?The IWA book showed another example (see the figure below).

The book states: “Figure 2.5 shows four clear peaks which grow over a period of three years showing the gradually worsening infiltration into the sewer environment.”

This is true, but there is actually a lot of non-artificial intelligence in extracting that information from the graph and even then we are making some significant assumptions to state that the cause is worsening infiltration.?Even if it is, we do not know if it is due to deteriorating condition of the sewers or climate change increasing the level of the water table.

To be fair to the editors, they do say that this is as simple example of the use of data rather than an example of digital transformation and qualify the assumptions as; “With additional information, such as rainfall, geology, and the performance of the system, this can be used to predict where the source of the problems within the sewer network is.”?

We could perhaps use machine learning or artificial intelligence to provide those insights for us.?Or can we just use human curiosity?

I recently did some analysis of treatment works flow records that provided some additional insight into what is going on, but using some simple data hacks rather than complex data science.

The data

I started with five years of daily flow data, similar to that shown above.?I also had daily rainfall depths for the same period.?It was a bit less obvious what the patterns were as you can see from the graph below.

I found that there was too much noise in the daily data to easily make sense of it, so I did my investigations by aggregating to monthly data.

Baseflow

The first thing that I looked at was the average dry weather flow through the treatment works.?The official definition of this in the UK is the daily flow that is exceeded for 80% of the time.?That is, we ignore the 20% lowest values as they may be due to monitor error or operational issues, but we take the lowest of the rest of the values as the dry weather flow.?This is a really useful definition as it doesn’t actually involve proving that it is dry weather by matching up rainfall data.

This calculation gave a DWF of 93.5.

Seasonal variation

A trick that I have used for a long time to help understand infiltration into sewerage systems is to do this same calculation for each month rather than over the whole year.?This shows if there is a seasonal variation in baseflow.?If there is no variation then the average of the monthly values is the same as the annual value.?If there is seasonal variation then the average of the monthly values is higher than the annual value.?

The results for this analysis are shown in the graph below compared to the straight line value from the annual analysis.?This shows a large seasonal difference in baseflow suggesting that the catchment suffers from a lot of infiltration in the wet months of the year.

领英推荐

What are the top challenges around working with…

Machine Learning 2 年前

All Hands on Data #103

Shipyard 8 个月前

DDI News - latest updates

Data-Driven Innovation Initiative 1 年前

I then assume that the lowest monthly value of baseflow is the true base wastewater flow and that the rest is infiltration.?This gives the graph below.

Beware that this assumption that the lowest baseflow value is the true wastewater flow may not be correct.?There could be an operational issue affecting the month with the lowest value, although in this example there are other months with very similar values so that is unlikely.?Also those dry months could be exhibiting exfiltration with some of the wastewater leaking out of the sewers (see Episode 55).?I will ignore that issue for now as it is almost impossible to take it into account.

Peak flow – slow response

The official definition of the peak flow that a wastewater treatment works has to be able to process is based on a completely different definition of flow in dry weather.?This is the maximum recorded flow on a dry day (less than 0.25 mm of rainfall) following another dry day.?(It would be so much easier if this also used a percentile definition but with a much higher percentile than for the DWF.)

As I do have the rainfall data for this location I can calculate this maximum dry day flow and add it to the baseflow.?This is shown as green in the graph below.?This is not direct runoff as it occurs on a dry day, but it is not base infiltration as it is not included in the baseflow.?This is slow response to rainfall on previous days.

Peak flow - fast runoff

The final component is to show the maximum flow through the works including wet days.?This gives the pink area in the graph below.?This does not add very much to the peak flow.?This is partly because this catchment is notionally largely separate with limited areas contributing runoff and also perhaps because there may be overflows and storm tanks lopping off the short term peak response to rainfall.

The overall results give a snapshot of the response of the system.?

Foul flow????????????????????35%

Seasonal infiltration???35%????Significant issue

Slow response???????????15%????Less of a problem than infiltration

Fast response????????????15%????Largely separate system

Conclusions

The results of the citizen science project reported by The Guardian and the analysis that I have set out here show that there is a lot that we can understand about the operation of our drainage and wastewater systems by relatively simple analysis of existing data.

Neither of these used machine learning or artificial intelligence, just non-artificial curiosity.?Can we give water company staff the time and the incentives to be more curious?

The DWMP blog

1,931 位关注者

Anthony Fernihough

Associate Director at AtkinsRéalis

1 年

George Walter Clapp - this maybe of interest

1 次回应

Leo Kiernan

turning data into actionable insight

1 年

Domain specific knowledge is hard won and invaluable.

Andrew Scott

Co-owner and developer at Meteor Communications (Europe) Ltd

1 年

Just for context, the latest Deep Learning Neural Network that I have built (on my budget) has 123 million 'Neurons', the human brain is thought to have 86 billion Neurons, plus Neuroscience is still rapidly developing and feeding into the development of Data Science. Curious humans will be required for a long time yet, at least until Quantum computers become mainstream.

1 次回应

David Brydon

Trade effluent consultancy

1 年

As someone who has a long history with "AI" in the water industry I can confirm that there's a lot of guff talked about the subject. Martin's blog hits the nail on the head. When I first started using neural networks (1993), I was told that 75% of the effort in developing them was basic data analysis, 15% was setting up the software (it was much more laborious then), 5% developing the models and 5% testing them. That's still more or less true. Many times I go to develop a model and find the answer in the basic data analysis. The term "AI" first fell out of fashion in the 1960s when everyone realised the models simply had no intelligence. That killed research into neural networks for many, many years. We're going through the same hype cycle now but this time the models are more complex and the hype will probably (hopefully) regulate the "AI" world rather than kill it. There's always been a tendency to throw a machine learning model at some data and hope for the best but the really important bit is that last 5% - testing the models to make sure they can generalise accurately. If they can't then you have a useless model, which brings us back to making sure the basic data analysis is correct in the first place.

1 次回应

查看更多评论

要查看或添加评论，请登录

Martin Osborne的更多文章

128 History repeating down the highway drain

2025年3月4日

128 History repeating down the highway drain

I saw a post on LinkedIn about a tool to investigate the pollution impact of highway drainage on watercourses in…

17 条评论
127 Red teams and the water sector

2025年2月25日

127 Red teams and the water sector

The recent inquiry into the UK response to Covid19 identified ““an acute problem of advice, scientific advice in…

11 条评论
126 The law is an ass

2025年2月18日

126 The law is an ass

Everyone knows that the famous quote originally comes from Oliver Twist by Charles Dickens, published in 1838, and many…

14 条评论
125 Building houses for a water future

2025年2月11日

125 Building houses for a water future

In the last couple of weeks, I have seen three reports about how to build new houses so that they are resilient to…

24 条评论
124 Of fish and hippos

2025年2月4日

124 Of fish and hippos

In the last episode of the blog I discussed the legal case brought by the Pickering Fisheries Association about poor…

4 条评论
123 Environmental guard dogs

2025年1月28日

123 Environmental guard dogs

OK, they are not dogs, but Ethiopian hyenas. They are let into the city of Harar every night to scavenge rubbish.

1 条评论
122 Communicating with communities

2025年1月21日

122 Communicating with communities

So last week we suffered an interruption to our water supply and collapse of the road outside the house. The water…

18 条评论
121 History repeats itself

2025年1月14日

121 History repeats itself

Woke up, fell out of bed, dragged a comb across my head. Found my way downstairs and drank a cup and looking up I…

19 条评论
120 You might think that, I couldn’t possibly comment

2025年1月7日

120 You might think that, I couldn’t possibly comment

In Episode 116 I gave some initial comments on the current consultation document from Defra on “Draft information and…

12 条评论
119 Words of 2024

2024年12月31日

119 Words of 2024

The Oxford English Dictionary (OED) word of the year for 2024 is “brain-rot” defined as: (n.) Supposed deterioration of…

5 条评论

See all articles

59 Machine learning v curiosity

Martin Osborne

Water industry strategic advisor, asset planner and drainage expert Winner of the 2023 WaPUG Prize for contributions to the development of urban drainage practice

Analysing treatment works flow records

The data

Baseflow

Seasonal variation

领英推荐

Peak flow – slow response

Peak flow - fast runoff

Conclusions

The DWMP blog

1,931 位关注者

Martin Osborne的更多文章

社区洞察

其他会员也浏览了

Leveraging Random Forest Algorithm for Enhanced Forest Classification: A Kaiinos Approach

The data flood we have been waiting for

Ode to Abraham Wald: A Quiet Genius with a Lasting Legacy

Algorithms, Simplified!

When Human Intuition Meets AI

Diving into GenAI: Thawing the Frozen Assets of Your Data Lake

Harnessing the Wide-Angle Insights of Knowledge Graphs

K-Means Clustering in Machine Learning

Support Vector Machines (SVM) in Plain English

The Swiss Army Infinitesimal Jackknife: A New Frontier in Model Variability Estimation Financial Statement Analysis with Large Language

Analysing treatment works flow records

The data

Baseflow

Seasonal variation

领英推荐

Peak flow – slow response

Peak flow - fast runoff

Conclusions

The DWMP blog

1,931 位关注者

Martin Osborne的更多文章

128 History repeating down the highway drain

127 Red teams and the water sector

126 The law is an ass

125 Building houses for a water future

124 Of fish and hippos

123 Environmental guard dogs

122 Communicating with communities

121 History repeats itself

120 You might think that, I couldn’t possibly comment

119 Words of 2024

社区洞察

其他会员也浏览了

Leveraging Random Forest Algorithm for Enhanced Forest Classification: A Kaiinos Approach

The data flood we have been waiting for

Ode to Abraham Wald: A Quiet Genius with a Lasting Legacy

Algorithms, Simplified!

When Human Intuition Meets AI

Diving into GenAI: Thawing the Frozen Assets of Your Data Lake

Harnessing the Wide-Angle Insights of Knowledge Graphs

K-Means Clustering in Machine Learning

Support Vector Machines (SVM) in Plain English

The Swiss Army Infinitesimal Jackknife: A New Frontier in Model Variability Estimation Financial Statement Analysis with Large Language