You have to fall in love with the Insights not with the Models (or with Coding)

Diego Vallarino, PhD (he/him)

Global AI & Data Strategy Leader | Innovator in ML/AI-Driven Business Solutions | Buy-Side Quant Finance Expert | Ex-Executive at Coface, Scotiabank & Equifax | Board Member | PhD, MSc, MBA | EB1A Green Card Holder

发布日期: 2022年12月5日

"It is essential to remember that when it comes to data science, the goal should not be to fall in love with the models or coding, but instead to fall in love with the insights that can be gained from the data. Models and coding are simply tools that allow us to gain those insights, so it is important to focus on the end goal of uncovering useful information and knowledge from the data."?This was written entirely by AI. With openai.com

As in all data analysis, context is important. Weekend, Madrid, tapas, ca?as, nerds, and talks about how data science is changing. The interesting thing is that there was a cross-section of people. Engineers, statisticians, economists, management, human resources, and data science outliers. The geographic distribution was also quite well distributed, at least between Latin Americans and Europeans. So some conclusions with statistical weight, at least in the set of friends, we could draw.

So, I share some conclusions that I can draw, and that really worried me:

There are people who have a lot of love for coding and not so much for the problem they want to solve.
There are people who want to use the latest algorithm that was discovered at MIT, Stanford, Google, in a 5-thousand-year sector, with a company′s culture from the XX century.
There are some people who love the model more than solving a decision-making problem.
Some people ask you what library you use, before asking what you want to solve.
There are some people who prefer to use XGBoost, because they read the last technical post, than another model that may have a little less accuracy but it is possible to deploy much faster.
There are people who code when their ideal is to get that code to do something similar (and generally worse) to Excel, SAS or SPSS.
There are people who pay a lot of attention to the technological infrastructure and forget that business results are needed. If there is no revenue, it is difficult for costs to increase.
There are people who are more afraid of lowcode than of Freddy Krueger.
There are people who launch products as if they were unique. The example of Quantumblack (McKinsey AI) with its CausalNex (here), based on the Google library on CausalImpact (here). In 90%, they do the same.

Obviously, the previous comments are biased. It wasn't all bad news, but it does worry me. Data science has been around for a long time and has always been about finding patterns and giving decision makers facts. In fact, some decisions are so simple and routinary that they could be systematized with prescriptive analytics. And only use the decision-making time in those ad hoc or that require more creativity.

Coding is not the focus of the topic. Comparing models is not the key to data science. They ask you what model do you use? have you used XGBoost for modelling? What libraries do you use? I think it's a huge waste of time. And that is not a predictor of anything. And if I tell you that I use SPSS Modeler and that when loading the database, defining the target, it automatically recommends all possible models. I press a run and it gives me a report with the performance of each model. Is that a data scientist? What if I do the same thing with Python, and the result is the same, am I a better data scientist?

Look at the models that are used depending on where you work (industry, academia or research).

No hay texto alternativo para esta imagen — Source: KDnuggets & Forrester

The problem is that many people who work in the industry want to use the models that those working in the academia use. And I'm not saying that people who work in the industry don't innovate based on data science (or in models), what I'm saying is that innovations in the industry are in the 4Ds that I raised in this post (here). Designing the problem in an innovative way (churn , default, etc. is not innovative). Define what Data you are going to use. Are you going to use differentiated, alternative, complementary data, or will you use the company data (biased) and add the data from the yellow pages?

Regarding Development , how are you going to develop the algorithms. Today in the industry everything is more or less within a fairly small margin. Believe me, in the last 3 or 4 years I have developed a number of models and deployed them, all based on code (mainly in R, because the need for more statistical power was important).

Jacek D?browski 3 年前

Unraveling the Mysteries of Decision Trees in Machine…

Venugopal Adep 7 个月前

Top 5 Videos from Extract Conference 2015

Vivek Kumar 8 年前

At an academic level, for my McS thesis in Statistics I developed 5 different survival models to see how they performed. My PhD thesis I used the Dif in Dif model to analyze the impact of (tax) incentives on investment decisions. Use autoregressive moving average to understand the behavior of Covid in Uruguay. I used ANN, RNN and CNN to develop an Income Predictor for the entire population of Uruguay. I used ANN and MLR to understand the propensity of several clients of a financial institutions (+200k clients). I used MLR to be able to infer the price of a head of cattle in auction processes. Use CausalImpact to find out if the UK government change in October had a major impact on the pound (here). And believe me, I can go on.

In fact, I leave here a comparison of models to understand how to predict "if a stock was going to have dividends or not". I leave it here. All the code. It's free. Use it. No company is going to generate competitive advantage based on this code, but if you are a SME and need my help, send me a DM and I′ll help you for free.

?Innovations in data science are not in how many libraries you use. Or if you use Python. Or whether you code or use lowcode. In whether you code or use SAS, SPSS, Excel, it's somewhere else.

It is in understanding that you have problems if you conclude without considering ergodicity, without knowing what is moral hazard, adverse selection, without understanding that you cannot model a chaotic experiment based on Bayes, in not knowing what Entropy implies for a data base, in not understanding that information asymmetry can be seen from different perspectives (as George A. Akerlof, A. Michael Spence & Joseph E. Stiglitz did), in developing nested models capable of telling us if the WHO (people or companies) will WHAT (propensity , origination , default , collection , churn , etc.) but also know WHEN they will, not knowing that within the value theory we have at least 3 stages: generating, appropriating and distributing value. Within another number of concepts that make data science in business (in others fields the knowledge are different, but the concept is identical).

And the truth is that this has nothing to do with Python/R libraries, it has to do with creativity, with the evolution of a discipline that is based on being able to find patterns, to generate insights, to make better decisions, to optimize the theory of value, and to be able to generate dynamic and sustainable competitive advantages.

Porandu

2,404 位关注者

Luis Ojeda

BI | Data Vizualization | Customer experience

1 年

Gracias Diego, por un artículo tan interesante.

1 次回应

Macarena Estévez

??? Passionate Speaker and Strategic Advisor in AI, Data, Trends, Metaverse, Future of Marketing and Work. ?? LinkedIn TopVoice ?? TEDx Writer and Thinker. #Data&AI #Metaverse #ROI #FutureOfMarketing #FutureOfWork

1 年

Very interesting Diego Vallarino, PhD (he/him)

2 次回应

查看更多评论

要查看或添加评论，请登录

Diego Vallarino, PhD (he/him)的更多文章

Maximizing Value in Formula 1: Who's the Real Value-Efficient Champion?

2024年10月1日

Maximizing Value in Formula 1: Who's the Real Value-Efficient Champion?

Three passions, F1 (long before Netflix), data, and finance. How to bring them together? writing this post.
Comprehensive Pricing Recommendation Calculator: A Tool for Strategic Pricing in Logistics.

2024年9月27日

Comprehensive Pricing Recommendation Calculator: A Tool for Strategic Pricing in Logistics.

In today’s rapidly evolving logistics and transportation industry, developing a well-informed pricing strategy is…

2 条评论
Causal Impact Analysis of Max Verstappen's Performance Decline: A Statistical Perspective

2024年9月22日

Causal Impact Analysis of Max Verstappen's Performance Decline: A Statistical Perspective

The analysis presented here provides insights into the performance of Max Verstappen after the regulation changes…

2 条评论
AI Innovation and Regulation: Striking the Right Balance from an Economic and Institutional Perspective.

2024年9月17日

AI Innovation and Regulation: Striking the Right Balance from an Economic and Institutional Perspective.

The debate surrounding Artificial Intelligence (AI) innovation and regulation is one of the most pressing economic and…
FSBI: More Than Just a Sales Report? Spoiler: ??

2024年9月7日

FSBI: More Than Just a Sales Report? Spoiler: ??

In my recent work analyzing the Fiserv Small Business Index (FSBI) to better understand the economic insights Fiserv…
Breaking the Data Science Mold: Reflections after four BBQs in My First two Months in the US.

2024年9月3日

Breaking the Data Science Mold: Reflections after four BBQs in My First two Months in the US.

It’s been about two months since I landed in the U.S.
When R/Tensorflow/Keras, Network Theory & Game Theory (Nash equilibrium) face reality: Market Power in Atlanta, GA.

2024年8月29日

When R/Tensorflow/Keras, Network Theory & Game Theory (Nash equilibrium) face reality: Market Power in Atlanta, GA.

1. Introduction The interaction between large and small companies is a critical component of the economic landscape…

2 条评论
Decoding Market Dynamics: How Atlanta's Leading Firms Shape Economic Networks.

2024年8月23日

Decoding Market Dynamics: How Atlanta's Leading Firms Shape Economic Networks.

As many of you who have been following me on #Porandu for a while know, I have been living in the US for a few weeks…
Who Needs Collusion? AI's Quiet Revolution in Distorting Economic Fundamentals.

2024年8月20日

Who Needs Collusion? AI's Quiet Revolution in Distorting Economic Fundamentals.

In the world of economics, market prices serve as the lifeblood of an efficient system. These prices, when functioning…

3 条评论
Mastering Financial Uncertainty: The Impact of RNN, LSTM, and GRU on Market Valuation.

2024年8月15日

Mastering Financial Uncertainty: The Impact of RNN, LSTM, and GRU on Market Valuation.

During the last six months, I have extensively studied the implementation of Recurrent Neural Networks (RNN) and their…

See all articles

You have to fall in love with the Insights not with the Models (or with Coding)

Diego Vallarino, PhD (he/him)

Global AI & Data Strategy Leader | Innovator in ML/AI-Driven Business Solutions | Buy-Side Quant Finance Expert | Ex-Executive at Coface, Scotiabank & Equifax | Board Member | PhD, MSc, MBA | EB1A Green Card Holder

领英推荐

Porandu

2,404 位关注者

Diego Vallarino, PhD (he/him)的更多文章

社区洞察

其他会员也浏览了

Data Science Trends for 2019

Exploring Data Imbalance: Techniques for Handling Skewed Class Distributions

Classification Tree - Read This Before Applying Your Random Forest Algorithms

Build You a Library

What I learned this week

Let’s not fall in love with our tools

Turns out, your model isn't all that

Vector Indexing plus Knowledge Graphs with Neo4j

The Accidental Data Scientists

Understanding Embeddings: Unlocking the Potential of Numerical Representations

领英推荐

Porandu

2,404 位关注者

Diego Vallarino, PhD (he/him)的更多文章

Maximizing Value in Formula 1: Who's the Real Value-Efficient Champion?

Comprehensive Pricing Recommendation Calculator: A Tool for Strategic Pricing in Logistics.

Causal Impact Analysis of Max Verstappen's Performance Decline: A Statistical Perspective

AI Innovation and Regulation: Striking the Right Balance from an Economic and Institutional Perspective.

FSBI: More Than Just a Sales Report? Spoiler: ??

Breaking the Data Science Mold: Reflections after four BBQs in My First two Months in the US.

When R/Tensorflow/Keras, Network Theory & Game Theory (Nash equilibrium) face reality: Market Power in Atlanta, GA.

Decoding Market Dynamics: How Atlanta's Leading Firms Shape Economic Networks.

Who Needs Collusion? AI's Quiet Revolution in Distorting Economic Fundamentals.

Mastering Financial Uncertainty: The Impact of RNN, LSTM, and GRU on Market Valuation.

社区洞察

其他会员也浏览了

Data Science Trends for 2019

Exploring Data Imbalance: Techniques for Handling Skewed Class Distributions

Classification Tree - Read This Before Applying Your Random Forest Algorithms

Build You a Library

What I learned this week

Let’s not fall in love with our tools

Turns out, your model isn't all that

Vector Indexing plus Knowledge Graphs with Neo4j

The Accidental Data Scientists

Understanding Embeddings: Unlocking the Potential of Numerical Representations