登录查看更多内容

DATA DRIVEN SCIENCE

Reinaldo Lepsch Neto

Experienced Data & Analytics Professional | Proud father | 50+

发布日期: 2020年3月2日

Clustering algorithms, the most popular nowadays in unsupervised machine learning, are just ubiquitous in many scientific fields. With plenty of color and configuration, they can give scientists lots of insights from a single chart, enriching presentations and white papers. They can be considered masterpieces themselves.

Multi-colored, 3-d charts are produced by algorithms that were born as the state of art of complexity, but today they can be written with a few lines of code. Just knowing the right packages to include and some functions and - voila! Ready for your slide to surprise your audience.

But things go far beyond that.

We can say that machine learning / deep learning algorithms and structures, which have been born from data science, have made this branch of computer science an intrinsic part of all quantitative sciences, besides a science itself. About data science, there would be lots that we could say, even when critics deny its scientific status. But here I would like to focus on an interesting phenomenon: data and their science have changed the inner core of other sciences.

Lab experiences and testing, which take to measurement, have been part of this core since ever. We can even say science was born from experience. And between experiencing and theory, measurement and results have apparently created a science of their own making. Here we saw the sciences of statistics, probability, numerical analysis and others develop almost from zero; today, all of them, besides the core topic, configure just like a compass that can drive the core itself.

I meant data – algorithms, structures, packages, and all the related insights – have been changing the way science itself happens and is done. More and more measurements have become dependent of huge datasets, that can only be handled with equally powerful data tools. And an experiment with a few small datasets is worth too less than another one with a massive data ensemble that can only be dealt with the right machine learning package and tool. This means if you a scientist – an astronomer, an applied mathematician, an environmental biologist, an organic chemistry engineer, a game theory practitioner – you must be a little – or not so little – of a data scientist.

Of course, packages on ML have been created and tested by computer scientists. But this has changed, and you can see, while inspecting the source code of the most recent developments, physicists, medical doctors, molecular specialists, as code writers. No, they aren’t changing their core areas: they are developing machine learning code as part of their science. Code that will make their results faster, more robust, safer and – yes – code that is made available publicly to be used, reused and eventually updated to evolve and become better.

When the free software concept exploded back in the 90’s as a way for computer science to evolve free from the powerful software corporations, it was not thought to be this wide. Proprietary software developers versus free software developers. Computer scientists with high incomes from corporations against idealists working for free at home while they should be sleeping. That made good products to be created, but not as much as they desired. Until data science came and told the geologist he/she could create his/her own custom software without having to know algorithm development details. And, the best thing: software that you should not tell the science core concepts to a computer expert, with the risk of misunderstanding.

So, we can say today applied/quantitative sciences have inverted the flow: in the beginning, the concepts flew from science to computer science, and software in the opposite way. Today, scientists generate software, that is used by other scientists and even computer experts.

This has changed the way their science happens. The timing of results, the proofs of concept, the experimental evaluation. Practice and theory have always fed each other, but data have evolved and gained status of a third leg here. This resembles grid computing, when you had lots of computers running some data-intensive software as a giant multiprocessing machine; today you have lots of computer scientists, producing small (or not so) pieces of code for specific purposes, but always publicly available to all.

And the role of computer scientists is important, too. Usually they evolve those pieces of code and make them better, faster and scalable to face the growing amount and size of data chunks. Brothers in science, we can say, helping each other – and the benefit is for all.

DATA DRIVEN SCIENCE

Reinaldo Lepsch Neto

Experienced Data & Analytics Professional | Proud father | 50+

更多精彩文章

社区洞察

其他会员也浏览了

Feature Clustering: A Simple Solution to Many Machine Learning Problems

Machine Learning Libraries

Mathematical foundations of data science and AI: Conceptions and misconceptions in learning

Artificial Intelligence No 50: Machine learning v.s. Statistics

Issue #203 - THE ML ENGINEER ??

Top Trending AI tools for 2023

Tensorflow Extended (TFX) - Towards End to End Machine Learning pipeline - Part 1

Databricks Mosaic AI Meetups: Coming to a city near you!

Data Science vs. Artificial Intelligence vs. Machine Learning vs. Deep Learning

New Books and Resources for DSC Members

INTELIGêNCIA ARTIFICIAL

2024年4月9日

Ukraine, the 20th century happening in 2022

2022年4月8日

Datum. Singular. Data. Plural.

2021年11月16日

“ARTIFICIAL INTELLIGENCE IS INTRINSICALLY ETHIC AND IT WILL REGARD HUMAN BEINGS WITHOUT ANY KIND OF PREJUDICE”: Is that true?

2020年2月11日

Como será o amanh??

2019年7月17日

No use, artificial intelligence and machine learning are one and only thing

2018年12月11日

THE OPEN PACKAGE REVOLUTION

2018年10月2日

Jovem novamente

2018年7月25日

The crossing

2018年6月12日

A new math : The power behind Deep Learning

2018年1月12日