Why good physicists make good data scientists?
Ilia Ekhlakov
Senior Data Scientist @ Wrike | B2B SaaS | Revenue Strategy & Ops | MSc in Physics | 9 YoE
An academic background in physics is often mentioned as one of the preferred qualifications in the requirements for Data Science Senior+ level vacancies. Personally, I know a lot of good physicists who have successfully transitioned into Data Science. On the contrary, however, I do not know anyone who has moved into this field and stayed in Junior or Middle positions for long periods of time. This could be due to sampling bias. Let's try to determine the reasons behind this.
Some of these reasons are quite obvious, such as good mathematical training and structured thinking. Problem-solving skills and experience in experimental planning also play a role.
To discover less obvious reasons, let's think about who becomes a good physicist? Usually, people who are genuinely passionate about understanding how the world works. This passion drives them to explore a wide range of topics, even those far beyond their formal scientific field of interest. After all, how else can you discover and understand the often unusual connections between different entities and the laws of our world? To understand how this works, be sure to read "Surely you're joking, Mr. Feynman!", by Nobel Prize-winning physicist Richard Feynman, if you have not already.
But are broad horizons and the ability to find unexpected connections between different concepts really that critical for Data Science? In fact, both are very important. And here's why. One of the major differences between real-world ML and academic ML is the almost unlimited possibilities for creating features. In almost all organizations, we have access to hundreds of metrics potentially related to the subject of forecasting, from which we can assemble features that describe the most complex patterns your imagination can imagine. If we also take into account the possibility of enriching our internal data with external data, the only limit is the cost of purchasing such data.
Let's consider this with the example of telecommunications data. It's no secret that telecommunications and banking themselves possess a vast amount of knowledge about their customers.
Now, let's add external data here. The number of new opportunities is literally exploding, but I want to highlight two:
领英推荐
It is clear that all of the heuristics described above have a certain percentage of false positive and false negative errors. However, what is important for us right now is that ideas for potential features of models are usually difficult to count literally. At the same time, collecting, validating, storing, and exploring each feature requires a significant amount of resources. That is why it is so crucial to quickly identify the most promising features, as well as to correctly rank them in priority. It's difficult to imagine how this could be done without extensive knowledge of the domain area, which is gained through both Data Scientist's own research and experience, as well through close communication with colleagues from different departments, with excellent general education, broad outlook, and the ability to uncover and verify non-obvious patterns. These are the qualities that manifest the physicist's mind at its best.
Don't get me wrong. I don't mean to imply that data scientists with other scientific backgrounds do not exhibit similar qualities. This would be contrary to even my own experience. It's simply that in this article, I wanted to discuss the influence of the physicist's perspective on the issue raised above.
There are aspects of the physicists' mindset that need to be altered to become a truly exceptional data scientist. This can be illustrated by an anecdote featuring the hero, who is the renowned Soviet physicist and another Nobel laureate, Lev Davidovich Landau.
Once an experimenter caught Landau in the corridor and asked him to explain the graph on a piece of paper. Landau explained. "But you're holding the chart upside down!" exclaimed the experimenter. Landau turned the paper over and explained again.
The ability to provide a plausible interpretation of even erroneous phenomena requires physicists to develop special caution regarding the quality of initial data and a high degree of criticality towards their own conclusions. However, with experience, this can even turn into a positive.
Macroeconomic Risk | AI, Deep Learning, Quant Enthusiast
1 年Economists as well ;)
valuable insights Ilia Ekhlakov,