Best programming language for machine learning

Many people have asked me which is the best programming language for machine learning to start with. To make it simple, I would recommend you know both Python and R. But you can start with Python.

Let's first talk about why you need both. I would like to divide machine learning into three big sections, including 1) statistical methods, 2) traditional learning algorithms and 3) deep learning. The boundary between statistical methods and traditional methods is actually quite blur, because machine learning is an engineering branch derived from statistics. If you look at the book titles, some will refer that to 'statistical learning' (statistics term), other will call it 'pattern recognition' (engineering term). If you know a little history of R, you will know R is built for statistic analysis. Therefore, R is the language if you're dealing with statistical methods such as generalised linear models or traditional learning algorithms such support vector machines (which was a very hot topic before deep learning took off). Basically, if the method is mature (or old) enough, you can always find some good packages in R. Although Python has the equivalent libraries for these as well, but the libraries in R are simply more robust (based on my personal experience) and have been revised over a long period. You may probably guess that Python is good for deep learning. This is mainly because most tools for deep learning are built surrounding the Python eco-system. TensorFlow from Google and the Keras interface allow you to prototype new network structures with few line of code. One thing you should be clear is that deep learning is a branch of machine learning which uses deep neural networks. So for shallow neural network, you can still use R to do it. But the benefits become significant when it comes to large-scale deep structure models. Having said that, TensorFlow has already been implemented in R, but personally I found the function calls very strange. So simply stick to Python for deep learning and you won't regret.

Let's move on and talk about which is best for beginners. I have taught people both R and Python. First, both are scripting languages and they are quite similar in some aspects. But I would recommend Python for beginners as the syntax is concise and consistent. There's usually an obvious way to solve the problem elegantly, while R is quite flexible (which is a good thing for experienced users) and often it relies on package-dependent knowledge.

Sidebar: which tool you are using now can be a heritage from your company and institute. Financial companies tend to use commercial software such as SAS and SPSS (you can call for technical support if there's an issue). University students and researchers tend to use MATLAB, since the company has offered educational licenses to these institutions. Apart from that, many people are using Excel which is great for analysing data spanning several column. The point I want to make is that you don't need to learn a programming language if what you're using can serve you well. You don't need to learn a musical instrument in orde to enjoy music, you can simply play the CD or MP3. Admiring someone who can play the piano is not a sustainable reason to learn playing the piano. Looking for a way to express your feeling is. So only when you want to achieve certain flexibility which these commercial data analysis products haven't incorporated, you can start learning Python or R, because they can offer what you current have and much more beyond.

In summary, you'd better know more than one language to cope with different machine learning tasks. If you want to start learning one, choose Python. This is a very crude guide for beginners. Let me know your comments, and we can dive into more detailed examples if you're interested.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了