Episode 7: Programming languages for Data Science
Favio Vazquez
Lead AI Scientist | LinkedIn Top Voice | AI & ML Evangelist | Drummer
Hello! And welcome to a new edition of the Data Science Now newsletter. In this session, I talked about the most important programming languages for data science. You can hear the podcast version here:
And if you prefer you can watch the video recording here:
Remember that we will be live (almost) every Wednesday here at Linkedin, 8 PM CST :).
Here's a short recap of what I covered in the session:
Why programming in Data Science (DS)? I've talked about several things in these sessions so far. Programming is important but only a tool, more important are the problem, business, add value for the company. But remember that we need to create data products, that means creating a system or model.
Programming is important in DS because you need to understand languages to transform the ideas into solutions (software solutions). You need to spend time learning to program. You need to understand how to code, the learning curve of programming, the basic structure of a language.
The most programming languages for DS used are python and r, but which one is better? We'll leave that for a different session.
When you need to decide where to start, learn python, it’s a full language. We have other sessions recorded about that:
You will find thousands of resources on Python and R in the web so I'm going to skip that for now. I want to talk about other languages that are also very use (or that are helpful for data science).
The list of relevant programming languages for Data Science:
- Python
- R
- SQL
- C++
- Go
- Julia
- Scala
- Java
- Javascript
If you’re ambitious you can learn at least 5 of them in 2 or 3 years.
Note: SQL, it's not exactly a programming language, it’s a data related system, you can't create whole programs, it's still important, a lot of companies use it, if you wanna succeed as a data scientist, you need to learn SQL, if you go to a company you have to go to the database and understand the data. That's what SQL does. A great course on SQL is the one by my friend Kristen:
What I will do is to put a section with the name of the language, a short description, and some resources to learn it. If you want more context on the languages please see or hear the episode.
C++
C++ is everything C is, and more. It’s not new, either, and has itself been the inspiration for many languages that have come behind it like Python, Perl, and PHP. It does however add in a few modern elements that make it a step up from C.
Free Books on C++:
- https://www.lmpt.univ-tours.fr/~volkov/C++.pdf
- https://fac.ksu.edu.sa/sites/default/files/ObjectOrientedProgramminginC4thEdition.pdf
Free courses on C++:
- https://www.edx.org/course/introduction-to-c-3
- https://www.coursera.org/learn/c-plus-plus-a
- https://www.coursera.org/learn/cs-fundamentals-1
Julia
Another one that’s important for Data Science is Julia, it’s new, has a few years, it’s growing a lot in scientific programming. It has an ecosystem for Data Science, dataframes, queryverse, juliagraph, Hadoop ecosystem. You also have libraries for ML like flux, automatic differentiation, GPU, deep learning algorithms. It's funded by great companies and schools, it's gonna be important in the future.
Free Books on Julia:
- https://people.smp.uq.edu.au/YoniNazarathy/julia-stats/StatisticsWithJulia.pdf
- https://www.sas.upenn.edu/~jesusfv/Chapter_HPC_8_Julia.pdf
Free courses on Julia:
- https://juliaacademy.com
- https://www.coursera.org/learn/julia-programming
- https://exercism.io/tracks/julia
Go
Another one is Go, if you know c++ it’s very simple, you can create objects, very good alternative to speed up to c++, has an ecosystem for Data Science as well.
Free books on Go:
- https://github.com/KeKe-Li/book/blob/master/Go/The.Go.Programming.Language.pdf
- https://miek.nl/downloads/2015/go.pdf
- https://www.openmymind.net/assets/go/go.pdf
Free courses on Go:
Scala
If you ask me for my preferences, on of my favorite languages is Scala. A language that was gonna be the future, everyone was talking about it, but didn’t happen. Spark is written in Scala, it has great libraries for ML, we thought the community would go there, it's still important, if you have to choose between Java and Scala, and you are a data scientist, choose Scala, it's not that easy but it's good.
Free books on Scala:
- https://people.cs.ksu.edu/~schmidt/705a/Scala/Programming-in-Scala.pdf
- https://www.scala-lang.org/old/sites/default/files/linuxsoft_archives/docu/files/ScalaByExample.pdf
- https://fileadmin.cs.lth.se/scala/scala-impatient.pdf
Free courses on Scala:
- https://www.coursera.org/specializations/scala
- https://cognitiveclass.ai/courses/introduction-to-scala
Other languages like javascript, it's great for the web, it's weird, don’t learn programming with it, you’ll learn bad practices. You’ll take a lot of bad things, but it's still important to create dashboards.
How much time should you spend learning these languages?
One of my favorite articles by Peter Norvig, Teach Yourself Programming in Ten Years, explain that this will take you years. If you wanna learn something like SQL, Python or R you can spend 6 months in a course, practice to master the language. For more complicated languages, it will take you 3 or 4 years to really understand and use them correctly. I'm not saying you need to wait that long to start, in a few months you can write codes, the complicated part is mastering them. If you can enroll into a computer science degree, do it!
Always Remember:
There's no easy path, you have to practice, study, and if you want to know where you're going, you need to understand where you come from. Then you will rule the world.
Thanks for reading this, please subscribe and share this with your network, it would help us a lot :)
With love by the Closter Team:
Gabriel Erives, Héizel Vázquez, Eilén Vázquez, Favio Vázquez.
Great Job Favio!
I have to learn Scala and Julia in a month time (or less). I know a lot in most of these programming languages. I'm presently wrapping up/finishing an ABSTRACT: ChainLadder Reserving Model with R (Solvency II). I told a friend we can do this Abstract in 2 weeks he didn't believe. We having a final meeting on Sunday where I 'm going to demonstrate to him the whole ABSTRACT in less then 10 R Functions.????????