登录查看更多内容

CSL, the new machine learning

Rob van Putten

senior specialist flood protection and geotechnics | developer | trainer | innovator

发布日期: 2022年1月24日

So I have been working on and off with machine learning for the last 5 years or so.. I think I must have seen at least 20 courses and finished 5 of them ;-) The options felt overwhelming.. like a kid in a candy store is what I wrote to someone just a couple of minutes ago.

I also wrote some code using machine learning, a traffic sign classifier based on images, a cone penetration test to soillayers convertor, a way to try and find important levee points on a 2D crosssection.. etc. Now I definitely do not consider myself to be a machine learning (ML from now on) expert.. I know how to use the models, I even know the mathematics behind most of the models (I had to because of acquiring my NanoDegree machine learning degree at Udacity) but if I look at Kaggle I feel like I only touched the surface and with a busy job and a lot of hobbies I simply lack the time to really dive into all the possibilities.

However.. the last months I tend to favour the CSL model.. It feels good, it feels natural and surprisingly.. it is often up to the task without writing all the code and collecting all the data you need for a ML model. So let me tell you about CSL and why I favor it many times over ML.

CSL

So what is CSL you might ask.. a new kernel with advanced stochastic gradient descent? maybe a new TensorFlow module? Sorry to dissapoint you.. CSL stands for Common Sense Learning and I think I am the inventor.

Now that is out of the way let me tell you why I think CSL can outperform ML, at least in most of my cases. For that reason I will show you some examples where CSL outperforms ML.

Characteristic points

I am a geotechnical engineer and I love to work on water safety problems so I like to find ways to automate the input for -for example- levee stability calculations. Once upon a time, in a land far far away, we had (actually still have) DAM.. dike analysis method by Deltares. And this is the input they required for the crosssections;

Mmmm.. imagine.. we have a levee of 1000 meters and we like to check this levee every 25m.. so we have 40 crosssections and we need 16 points.. that would mean to input 640 points. Actually I wrote software with a friend of mine to 'click' these points so we tried to make it easy for people to do this task but was it still annoying.. oh yes! Especially if you know that we tried to do this for 550km with a crosssection at every 10 meters (yeah.. ambitious times)

Then came my ML period.. and I thought.. ML can help! So I asked a lot of collegues from Dutch waterboards if they used this tool and could send me the output they had generated and I got a lot of 'clicked' points. I spent days on collecting the data, getting it ready for easy input and again days, maybe even weeks to build and test models.. the results were awful.. Yes, some points were ok but most of them were rubbish.. even worse, it seemed to generalize towards a specific kind of levee (which we call secondary levee for the smaller rivers) and not work at all for the primary levees (like those along the large rivers).. so I had to split up the data, create two models.. etc. etc. In the end the result was not usable and I was a litte frustrated.

A couple of years later I was less hyped with ML and came back to the same problem this time using CSL. The first thing that came to mind was.. why do I need all those points? Well DAM needs it, but do I need DAM? Turned out, I didn't. So then I thought, what are the points that I really, really need. A lot less..

I needed the yellow ones and the blue underlined would be nice. Now CSL came to the rescue.. first of all the really OCLS (Obvious common sense learning), if you generate a crosssection then the first point (left to right) will be 'maaiveld buitenwaarts' and the last one 'maaiveld binnenwaarts'.. oh wow.. aleady a 100% secure method ;-)

Looking at the source data we also found that we had raster files (kind of a big matrix with x,y locations and whatever data you attach to those locations) with landheight, the bottom of the levee and the bottom of the ditches. CSL to the rescue.. simply mark the points because then it is really easy to find the source of the point and it will be easy to find the ditch points (the ones with 'sloot' in it in the image). Image you have consecutive marked 'ditch' points then you can mark the first one as the left side ('insteek sloot dijkzijde') and the last one 'insteek sloot polderzijde'. So a simple script using GIS data and using CSL fixed those points.

Now up to a little ACSL (Advanced common sense learning.. and this is the last time I make 'funny abbreviations ;-) what about the really, really important yellow marked points. For ACSL you best think out of the box.. this was in the time that I thought that I could make money using automated bitcoin trading (but that's another bedtime story).. and one of the techniques to find sudden changes in prices is the usage of Bohlinger Bands. Here's a picture;

领英推荐

Black and white boxes: explaining the maths of machine…

Ajit Jaokar 11 个月前

Unlock the Power of Machine Learning in Data Science &…

InbuiltData 1 年前

7 Machine Learning Algorithms Made Easy

Randy Lao ?? 6 年前

Some my ACSL kicked in and I thought.. what if the surface of the levee is the stock price / bitcoin price and I look at the intersection of the upper band with this line.. turned out this was quite a nice way to find the points I needed with a good accuracy.. way better then ML!

Off course I had to tweak the algortihm a little but the results were excellent. I now have all the points I need to automagically generate the characteristic points I need. This makes it possible to generate a complete 2D model of the levees that I am working with.. nice!

CPT interpretation

Another example. A CPT is a cone penetration test which we as geotechnical engineers use to try and understand the soil beneath our feet. Now that's all the detail I am gonna give you or else I would be lecturing geomechanics which is not my current purpose of the article. Suffice to say that a CPT generates a lot of data and we want to translate that data to soil names like 'clay' or 'peat' or 'sand' etc.

In my ML phase I collected loads of data were CPTs were manually interpreted. So I could find a correlation between the raw values of the CPT and the soil names. I wrote some articles about it because I was really happy with the process but it turned out that there were always some errors that made the algorithms unusable.

This was also due to a lack of ML knowledge, I am the first to admit it and later Ritchie Vink wrote an excellent articel about using advanced ML were it definitely did work. But still I found some things that I had to improve to be able to actually apply his algorithm.. and it was also behind an API and it always itches if I can't do it myself.

There are loads of well known correlations for CPTs and I wrote code for some of them but I also found code online for a well known correlation. The problem with this correlation was that it simply did not find a very important layer (peat, which is really the nightmare of levee assessments in The Netherlands). Now the ML algorithm did -most of the times- find that layer but again.. that was behind an API and not code I could adjust.

CSL kicked in.. there is a very simple rule that is true most of the times.. a CPT has a special value called friction and using a simple formula you can generate another value and if that is above a certain threshold it means that you are probably dealing with peat. So CSL told me.. just use the (non ML) algorithm that was already provided with access to the code and pickup the result but add my own rule that everything above a certain threshold will be marked as peat.. and.. it works! (for the geotechnical engineers among us.. I could never find the lower peat layers)

So is ML bad?

Oh deary no.. that's definitely not the point of this article. ML is a great tool, it has so many interesting options and if you have the right problem it might be the best way to progress.. way better than CSL. But my point is that people might get carried away and try to find ML solutions for problems that can easily be solved with CSL. Just don't believe the next sales pitch that ML will solve all your problems.. It won't.. trust me, your data is most likely not even ready for normal statistics (been there, done that!). So if you feel hyped by ML, do yourself a favor and think CSL first. Think out of the box, be creative and if you still think ML is the way to go.. then it probably is.

Have fun!

Rob

Mark van der Krogt

Senior Onderzoeker en Adviseur Geotechniek, betrouwbaarheid en risico bij Deltares

3 年

Enlightening view Rob. Seems that you are describing the difference between Data Science and Machine Learning ;-)

1 次回应

要查看或添加评论，请登录

Rob van Putten的更多文章

Makkelijker kunnen we het wel maken, leuker ook!

2024年8月17日

Makkelijker kunnen we het wel maken, leuker ook!

Om de belastingdienst maar eens te parafraseren. Ik wil het over dijktoetsingen hebben en waarom we dit zo vreselijk…

13 条评论
Development blog 2024-01-04

2024年1月4日

Development blog 2024-01-04

Introduction I have been working on a web application for levee safety for a couple of years now and I cannot remember…
Continu inzicht in dijksterkte, niet zo moeilijk als je denkt!

2023年9月28日

Continu inzicht in dijksterkte, niet zo moeilijk als je denkt!

Pas op..

2 条评论
Waternet, management, politiek en journalistiek

2023年7月23日

Waternet, management, politiek en journalistiek

Pfff..

8 条评论
So you want to build a web application

2022年10月13日

So you want to build a web application

Everything is cloud based these days..

4 条评论
Chromebook experience

2021年9月18日

Chromebook experience

I am just guessing..
Automating CPT interpretation

2021年6月18日

Automating CPT interpretation

I have taken some time to compare modern day CPT interpretation methods, either Python packages or API based. Let's see…

13 条评论
What is Deta and why you should care

2021年3月9日

What is Deta and why you should care

note; this review is based on my own opinion and not in any way sponsered by Deta (you never know what people might…

3 条评论
Python for CxO's

2021年2月15日

Python for CxO's

Heb je toevallig ook gekeken naar de laatste zondag met Lubach en voelde je je een beetje aangesproken tijdens dit…
Creating webapps using a game engine

2020年12月14日

Creating webapps using a game engine

Did you ever expect that it would be possible to write web applications with a game engine? The idea never occurred to…

6 条评论

See all articles

CSL, the new machine learning

Rob van Putten

senior specialist flood protection and geotechnics | developer | trainer | innovator

CSL

Characteristic points

领英推荐

CPT interpretation

So is ML bad?

Rob van Putten的更多文章

社区洞察

其他会员也浏览了

Mathematical foundations of data science and AI: Conceptions and misconceptions in learning

4 algorithms machine learning engineers should know

Best Path for Developers to Get into Machine Learning (ML4Devs Newsletter, Issue 4)

Building a Machine Learning Pipeline

Common machine Learning Algorithms

Not So Clever Hans: An Overview of Adversarial Machine Learning

10 Machine Learning Algorithms Explained Using Real-World Analogies

Machine Learning

Unleashing the Power of Big Data: A Comprehensive Look at Machine Learning Algorithms

Machine Learning: Introduction and Practical Example

CSL

Characteristic points

领英推荐

CPT interpretation

So is ML bad?

Rob van Putten的更多文章

Makkelijker kunnen we het wel maken, leuker ook!

Development blog 2024-01-04

Continu inzicht in dijksterkte, niet zo moeilijk als je denkt!

Waternet, management, politiek en journalistiek

So you want to build a web application

Chromebook experience

Automating CPT interpretation

What is Deta and why you should care

Python for CxO's

Creating webapps using a game engine

社区洞察

其他会员也浏览了

Mathematical foundations of data science and AI: Conceptions and misconceptions in learning

4 algorithms machine learning engineers should know

Best Path for Developers to Get into Machine Learning (ML4Devs Newsletter, Issue 4)

Building a Machine Learning Pipeline

Common machine Learning Algorithms

Not So Clever Hans: An Overview of Adversarial Machine Learning

10 Machine Learning Algorithms Explained Using Real-World Analogies

Machine Learning

Unleashing the Power of Big Data: A Comprehensive Look at Machine Learning Algorithms

Machine Learning: Introduction and Practical Example