The Making of a Data Scientist: How I Became One
At the end of last year I decided I was going to become a machine learning expert. Before that I had created this positive vibe that made me feel like I could accomplish anything. I had put in place the habits I knew I needed if I was gonna be a high performance individual. So I had no doubt that I was gonna hack AI and would soon be consulting on very big projects, well, I am still hoping. What I did next was register for Jose Portilla's ML course on Udemy and started on my daily routine of waking up at 5 a.m. to put in one hour of focused study.
In the beginning, it was all a breeze. Installing Python, running the first app, installing the libraries using pip, this was a walk in the park, I could basically do that in my sleep. Then came more interesting parts. First up was exploring the libraries and the basics of data science and machine learning. I worked with Numpy and Pandas, and learned how and why we do exploratory data analysis. I found that my experience with Excel helped me get a hang of Pandas really quickly.
After that came data visualisation using Matplotlib, Seaborn, Plotly and Cufflinks. It is absolutely amazing to see the kind of graphs you can create with one line of code in Python using these libraries. I found that the graphs really do help to make better sense of the data in your hand which then allows you to decide what to do with it. Sometimes, the relationships that emerge during exploratory data analysis and visualisation could prove to be invaluable both to the data scientist as well as to the corporation.
At this point, I was dying for some actual machine learning action. You know, I wanted to see something think for itself, predict when I am gonna make my first million from data science, or something like that! It took discipline to actually stick with the course outline and laboriously go through geographical plotting section. So, I struggled with choropleth maps for two days. And I found them very highly abstracted. What I was doing in code seemed to have no relation to the magic that showed up on screen when I hit run. Anyway, if need be, I can now create one with some help from our old friends Google and Stack Overflow.
Eventually, after about four weeks of holding back from skipping the introduction and jumping straight into machine learning models, I was gently introduced to linear regression. You cannot imagine my shock on realising that this was stuff I already knew from my college statistics course. I actually took the time to go through the companion textbook for the course, Introduction to Statistical Learning with R (ISLR), and confirmed that it was indeed the same old regression. Take some known input vector (X1) and its known output vector (Y1), find a curve of best fit that generalizes the relationship of X1 with Y1. Bam! That curve of best fit is your first machine learning model, it can predict the value of unknown Y2 for some other input X2!
But then things started to happen too fast. I learned of the bias-variance tradeoff and the need for cross-validation. There was logistic regression, KNN algorithm, Decision Trees and Random Forests, and related concepts of bagging and boosting. I relied very heavily on the textbook to understand the internal workings of these algorithms and concepts before trying them out practically using Scikit-learn's toolkit. As such, progress was rather slow at this stage of my learning. And even though I can work with SVM, K Means Clustering, or TensorFlow, I know there is a lot of depth still unknown to me.
In my next article, I will write about the lessons I learned while trying to deploy my first ML model on a CentOS server using Flask and a Python virtual environment. Someone also requested I write about the work-ethic that allows me to explore this much while holding down an 8 to 5 job. Elico Sifuma, Nelson Mwangala, myself, and others are supporting and developing for over ten banks which are integrated or integrating into the agency banking system in Uganda, which is a first of its kind globally. Maybe someday I should also write about the lessons we have learnt so far.
Nevertheless, at this point I feel confident enough to start working on data science and machine learning projects. I have the necessary skills to do exploratory data analysis and visualisation. I can work with a number of machine learning models and deploy the same using Nginx, Flask, and uWSGI. Further, I now know what to Google to find the answers I do not have or to fix the inevitable bug or two. But, most importantly, I can communicate deep insights in various way to help non-technical people understand and use the secrets that the data is telling us.
Mechatronic Engineer - Electronics| Embedded Systems| Industrial Automation | Hardware | QA Testing
5 年I would like to engage in a data science project . So far my web scrapying skills using Scrapy and importing data scraped into csv and into a database like postgresql or mongoDB are up to point. Which approach should I take in the case whereby I have to analyse real live actual data and implement it in analysis basis, maybe like in term of lead generation in business. Any advice would be appreciated?
Product@Jumia | Product@SafeBoda | Product@Microsoft | Solution Architect | Crafting Products that Scale |
5 年Well done! didn't know we are on a similar journey, more focused on the data engineering side of things
Data and Product
5 年Impressive. I like your approach.?
Fintech | Payment Solutions | Enterprise Integrations | Expert Software Engineer | Solutions Architect
5 年This is very commendable.I am inspired.