Machine Learning Programs are far simpler than Computer Graphics Programming
Currently, I am preparing an Analytics course, which includes the following content
- Analysis/Synthesis Thinking Model
- Pragmatic Statistics
- Machine Learning and it's associated Lingo
- Software tools, platforms and vendors
- Case Studies from different domains
While "researching" for the material for Machine Learning, I was bit disturbed by the fact that, whichever machine learning topic I approach through R or scikit-learn, it reduces to some statistical method,which I have come across in the past. I skimmed through all the books I am having on the subject, Searched Web, downloaded course materials and so on, the saga was the same.
I took a step back and began to look at the topic without the "framing" which people subject you to like "Machine learning is domain of Stat/Math grads from Ivy league colleges","You cannot become a Data Scientist, if you are a Programmer","Machine Learning....uhhh.......you need to be math geek","How come a programmer become a Machine Learning expert?" etc. The epiphany I have had was at it's core Machine Learning is intermediate stat and mathematics, which most of us have learned, if we have had a 10+2 education.
Now, my thesis is
- You can use Machine Learning tools, if you know 10th standard math/stat
- You can write Tools,if you have got 11th standard math maturity
- If you have got 12the standard math competance, you can modify standard algorithms
Only place where you require heady math is to invent new algorithms or prove the mathematical basis of your algorithm. As an engineer, you can ignore it. That is to become a researcher, you need to have understanding of Mathematics, or more specifically Mathematical Statistics. Andrew Ng's course on Machine Learning is catering to ML researchers and to do Machine learning at the workplace, that is the worst course one can take. I know lot of guys, who have undergone the ordeal of Andrew's course and feels incompetent when it comes to Machine Learning. Things like Partial Derivaties, Gradient Ascent/Descent, Linear Combinations etc., gives nightmare to most programmers.
A functional approach towards machine learning through GNU R, scikit-learn, Apache Mahout/Spark Mlib etc. is a pragmatic way to get into the subject of Machine Learning. Once you play with lots of data and these tools, you will have enough competence to understand some of t he hairy math behind these algorithms. Refer to Mathematical literature only after you have familiarized with your Lingo.
A List of some Machine Learning and their underlying principles are given below
- Rule Based Classification - A Data Set with known Out parameter + If/else logic
- Decision Tree Induction - Splitting the Data Set based on Attributes until you reach so called pure nodes, where all the records do have the same target parameter
- Linear Regression - an Algebraic formula in the case of Single Linear Regression and Basic Matrix Math operations for Multiple Linear Regression
- Naive Bayes - Elementary Conditional Probability and Notion of Independence. We all learn it during our intermediate studies
- Logistic Regression - Elementary Probability/Odds calculation and exponential/logarithms
- Clustering - Notion of Metric/distance computation. Centroids, and Hierarchical Decomposition
- Apriori/Frequent Patterns - Sub set computation,Correlation and Frequency Analysis
- Recommendation - Some familiarity with Eigen Value Matrix Decomposition and a cursory familiarity with Singular Value Decomposition (NumPy/SciPy can help you here)
- Dimenstionality Reduction - EVD and SVD
- Nueral Networks - Linear Combination,Error Minimization, Feedback loop, Iterative methods,Activation functions, some ideas about Decision trees and graphs.
I have studied Computer Graphics Programming extensively in my younger days and the math I encountered there is enough to have deep understanding of these topics. Even then, It took me lot of time to understand that Machine Learning is far simpler than that, if you step back a bit. The outside "pressure" keeps us not being sane.
All of these tools can be classified into
- Hilbert Space Methods ( Most Metric based Algorithm falls into this category)
- Statistical Methods
- Deep Learning ( Nueral Networks)
The personal path which I have undertaken was "topsy-turvey", as I thought Machine Learning is a difficult subject. I was "framed" into believing so by the people who induced me into the subject. IMHO, I can teach above stuff in a week's time, if you and I are inside a room for that amount of time. Of course, we need to apply these model to gain deep expertise. Familiarity with data and tools are the key ingredients of your success.
I call the above method, "Josh Kaufman" method (Twenty Hour Method) and most people suggest a "Malcom Gladwell" (10,000 hr) method. Think about it. Contact me, if you want more details.
An aspiring violinist working in IT industry
5 年Nicely written.
Solution Architect at QBrainX
8 年I personally not into MT, but i appreciate your writing skills and language used, on top of technical things. Good to see fellow Indian writes at this high level (both technical and writings).
An aspiring violinist working in IT industry
8 年next level of incompetence?
Senior Technical Architect | Enterprise Architect | TOGAF Certified | Cloud Solutions Expert | Performance Optimization Specialist
8 年Great post Praseed, I do miss our chats.