登录查看更多内容

Let’s not fall in love with our tools

Eduardo Barbaro

Head of Security Analytics ING | Visiting Researcher TUDelft | OpenGroup Thought Leader Data Scientist

发布日期: 2016年5月30日

In an age where computers are getting more powerful and more “super”, it is so easy to praise how fast we can dedupe/manipulate/transform or access our (always!) “big data” sets. I am most impressed by the number of the data-science discussions remaining annoyingly superficial about “how many tools I know” that can, in fact, do the same thing. These talks often end in a tedious showing-off-competition of “actually, if you use A and B instead, you can do that much faster”. Don’t get me wrong; most of the tools out there are significant and fundamentally important. How could we do our jobs without mastering Python or R, or how could our databases serve us if we didn’t know SQL or Mongo? In an age where Machine Learning is the new black, it is easy to fall in love with Clouds and Clusters, Hadoops and Sparks, Hives and Pigs (I am sure you get the idea by now). They are fancy, there’s a lot of trendy discussions around them, you get to talk encrypted “data-scientist language” (more unintelligible the better). However, most importantly: you can skip the hard work and stay swimming at the shore of tooling. Yes, most of us (data scientists) forget that these are just tools! I find it remarkable not remembering many discussions about integration methods, differential equations, or highly skewed distributions. People tend to forget (ignore?) that behind the curtains of that fancy library there's a lot of Math/Stats we must understand, or at least spend much more time thinking about. I know, you may be thinking: “Mathematics and Statistics are also just tools”. Well, true, they are there to serve us, however in a much more fundamental level. And by the way, I am sure Math and Stats will remain out there for more than a year or two, or until the next ultimate tool pops out.

Let’s spend (much) more time thinking about what we can do with our data! On how we can generate insights on our datasets; or to help our clients answering their important questions. Let’s use our time to design more accurate numerical experiments, think of sharper research questions and hypotheses.

And finally, let’s focus on learning the abstraction instead of praising some tool.

I am sorry, but I have to go now. I just found out that I can read my big data in R much faster as data.table instead of data.frame. Can’t wait to try...

Raza Sheikh (TOGAF and CDMP)

Helping Startups with Business, Data, App, & Tech.

1 年

Thank you for sharing, Eduardo! ??

Serge BLANC

Project Manager Data Analytics EMEA at Altair Engineering

8 年

Very true Eduardo. The best tool is always... whatever works! The answer lies into the data, not into tools. And we know that actually with the same data many tools will yield similar results. If you spend your time arguing about tools, you are more doing Science than Data, not focusing on solving real problems.

1 次回应

JUAN ZHENG

Product Development & Solution Consultant

8 年

Very true! To understand our data better and find out what we can do are more important than the systems or tools themselves.

查看更多评论

要查看或添加评论，请登录

Let’s not fall in love with our tools

Eduardo Barbaro

Head of Security Analytics ING | Visiting Researcher TUDelft | OpenGroup Thought Leader Data Scientist

社区洞察

其他会员也浏览了

What Data Science Forgot

5 MUST KNOW QUESTIONS FOR A DATA SCIENTIST

You have to fall in love with the Insights not with the Models (or with Coding)

Free Your Models from the Harbour

Self-Made in Data Science: A Good Idea?

Understanding Big O Notation, Time and Space Complexity

Vector Indexing plus Knowledge Graphs with Neo4j

COVID-19 Public Dataset Program: Unleash the Dragon

40 Techniques Used by Data Scientists

Casual Inference is DS with context