Overcoming the Machine-Learning concept barrier
Dr. Carlos Ortega
Passionate Researcher in Heterogeneous Catalysis and Catalytic Reaction Engineering | Data Science & Programming Enthusiast | Statistics Advocate
(reading time: 2 min)
Nowadays, hearing the words Machine Learning is not uncommon. In my case, the #MachineLearning topic pops-up in my feeds, in fields that range from Netflix to biology, astronomy and, of course, #HeterogeneousCatalysis. Recently, Prof dr. Stefan Palkovits (@PalkovitsLab) published a paper with the title “A primer about Machine Learning in Catalysis - A tutorial with code” that truly triggered my curiosity (the paper is open access and you can find it here). I saw this as an opportunity to learn and gain tools that could be relevant in this data-driven era.
To be honest I didn't know where to start. Without any knowledge, the Machine Learning concept seemed rather abstract and one can be overwhelmed by the amount of information available online. After reading the syllabi and watching a few introductory videos from courses on #edX, I decided to pursue the Machine Learning with Python: A Practical Introduction (IBM - ML0101EN) course from #IBM. It was the right choice. The course content is presented in a simple and intuitive way with clear and relatable examples. In addition, the course includes #PythonLabs, which are a good complement to understand and practice the concepts explained in the videos.
What are my takeaways?
- Machine learning consists on the development of mathematical models based on data, which contrasts with the “traditional” approach that relies in physical principles to develop the model and then fit a set of parameters.
- Now I have tools (at least I know that they exist and where they are) that supplement my interest in mathematical modeling within the context of catalysis and catalytic reaction engineering.
This is not the end of my #MachineLearning journey. While studying for this topic, I came across an online course from professors Trevor Hastie and Rob Tibshirani, from Stanford University, which I have found quite insightful. The course content and book are freely available here. The content is rich and concepts are explained with clear and interesting examples solved in #R. I have taken a personal challenge to solve the Labs in #Python, rather than #R, because why not, so wish me success!
I would like to close this post with a quote from Marie Curie
“Nothing in life is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less”.
What would you like to understand more and fear less? and, what are your thoughts about Machine Learning?
Scientist and Chemical Engineer
4 年The data driven approach alone might lead to wrong conclusions if applied on chemical systems and domain knowledge is critical to establish the right models. Often data, for example spectra, need to be pre-processed; however preprocessing without understanding the underlying analytical model or at least how the data is generated might lead to include irrelevant data into modelling. Scattering, thermal effects, solvent effects, wrong variable selection are examples of things that might ruin modelling if not handled. To best of my knowledge, most of the courses on machine learning deals with discrete data (easier to handle) not correlated data such as spectra. Chemometrics is the field that combine both discipline and mentioning that, I would recommend you to check Principal component analysis PCA (as a best exploratory method ) and Partial least squares PLS (as best regression method). The basics in PCA and PLS can be used later to establish clustering models (PLS-DA). During my PhD I used to apply chemometrics to establish analytical models for investigating lignin valoriaztion reactions. The domain knoweldge was very critical to be able to build such models.
R&D Manager at SaXcell
4 年Indeed these concepts will be part of the knowledge package for scientists in a short future. Very interesting Carlos, thanks for sharing