Episode 2: The History of Data Science
Illustration by Héizel Vázquez

Episode 2: The History of Data Science

Hello! And welcome to a new edition of the Data Science Now newsletter. In this session, I talked about the history of data science. You can hear the podcast version here:

And if you prefer you can watch the video recording here:

Remember that we will be live every Wednesday here at Linkedin, 9 PM CST :)

In the episode, as mentioned, I talked about the history of data science. I reviewed a timeline my sister, and I created a while ago:

No alt text provided for this image

You can find the source in this article:

And also thanks to L. Van. W. we also have a version of timeline with links, where you can click on a point in the history and will take you to the article, paper, blog or related topic:

The idea of creating this timeline came from several people asking me what the future of data science was. When I was thinking of the start, I wondered, do I know what Data Science is? What do we do? And after I was quite satisfied with the answer, I wrote in several articles like:

and

I realized that I needed to take a lot from many fields. Our timeline was going to be complicated but impressive. Here's a brief explanation of what I said in the episode (remember you can find links to all of the articles and moments in the webpage above):

At the beginning of the timeline, in 943, with an article about neural nets, we were at a very early stage of computers. The next point is the paper from Shannon, a fantastic piece, to understand how a computer works. What's information and how to transmit it, a relevant article.

Then, there’s an article by Turing in 1950, one of the fathers of computation, started to think about are the limits for computers, how the field was going to expand, and if we would be able to differentiate a computer from a human. He knew it was going to change the world.

Of course, this timeline can't contain everything. It considers the essential things from 1943 to 2019. I'll be adding the new advances in 2020 soon, please if you have an idea of what to add, please comment here :).

In 1962 we had an article from Tukey the future of data analytics. He started thinking about the difference between doing statistics with older methods and with computers, and what we would do in the future with all this power. He realized the field should go in the direction of machine learning and algorithm and that statistics should be going in the course of data analytics.

I think this is the first time someone started to think about Data Science. We now have the power. He has a quote where he thought of himself as a statistician, but now he cares more about the data, so maybe he was a data analyst. I talked about that in my article On Data and Science:

In 1974 we have backpropagation, one of the fundamental algorithms in the last millennia about propagating errors between a net and teaching to be better, super important in the development of neural nets.

In 1977 we have the book by Tukey about exploratory data analysis. How to understand data from the beginning. Getting data is getting easier, but you need to know what you're doing. You need to understand the data, implement statistical tests, they're useful.

We have the foundation of the International Association for Statistical Computing. Thinking about computers with statistics. It was the beginning of a field, now we have a system to study al of that.

Going forward to 1996, we have the great Gregory Piatevsky-Shapiro founding one of the most important conferences about Data Science and think about how to get useful information from databases.

After that, a bold conference by Jeff Wu where he says: Statistics = Data Science?

We didn't have a science, he thought about statistics becoming a science, the importance of it because if we have science we can do more things, create a system to study it, care about having a hypothesis, observations, tests, reproducibility and more.

I've been talking about data science as a science for a while; you can find that in my article On Data and Science.

In 1998 the creators of google published a paper talking about indexing the web, extract the data, and putting it into a system to find useful information that led to the creation of Google and new ways to analyze data on a big scale.

Cleveland proposed a system to study Data Science. I had no idea when I was studying; this happened in 2001; it was a significant step to create the field.

Leo Brainman, also in 2001 creator of decision trees, talked about the statistical cultures, he mentioned two types of people, old school and new school, the new one thinking about algorithms, I believe he was one of the first people to see data science as a serious field.

In 2002 we have the foundation of the Data Science Journal. A new beginning of publications, implementing algorithms. After that, some guys on google created map-reduce, how to deal with data on a big scale. It was only a theory and a private implementation until the people at Yahoo! created Hadoop as free implementation. This as the start of a new world, where we could see the advantage of big data to understand data and change the world.

Even though in 2007 Java was the leading programming language in the world, some people created scikit-learn; it took a while but was one of the most important libraries for python at the moment, and now it has inspired lots of details used by other libraries.

We had all this power in computation; the hype of the moment was big data, 2007-2010 were about big data, the apache world. But at the same time, people were trying to focus on machine learning, we saw the rise of the data science, we had a taxonomy of data science.

In 2010 we have the data science Venn diagram, I'm not a super fan of the diagram, but I think it helped people understand what data science was about.

In 2012 data science was named the "sexiest job of the century." I think he was right; the interest has increased in academia and industry, and schools are opening programs. But I don't believe this is going to be it; there's much more happening that we still have no idea.

Starting in 2009 we started thing about link data, knowledge graphs, and data fabric, that I believe will be the future of the field, not only SQL and relational databases, we’ll start understanding what the data is

In 2014 we have alpha-go, a movie on how google won a match of "go" using machine learning. This was the first time I heard about the power of Machine Learning. Without question, it was an important moment of the decade.

From 2014 we've seen many applications and discovered how creativity is a crucial point in Data Science. A few months ago, I wrote an article on how to separate tracks in music, but the algorithms used for that were created for medicine. You can read it here:

Never let someone tell you you’re not creative. Math and science need creativity; you can change the world with them.

I think now is the best time to start learning data science. I’m launching courses with Closter, so make sure to follow us and subscribe to the newsletter for more info.

Remember: There's no easy path, you have to practice, study, and if you want to know where you're going, you need to understand where you come from.

Thanks for reading this, please share this with your network, it would help us a lot :)

With love by the Closter Team:

Gabriel ErivesHéizel VázquezEilén VázquezFavio Vázquez.

Eduardo Emilio Ebrat Estrada

Lic Comunicación Social Especialista en Periodismos de Datos, Inteligencia Artificial y Bases de Datos.

4 年

要查看或添加评论,请登录

社区洞察

其他会员也浏览了