Statistician and Data Scientist. Differences and why being both is the best combination. By Darko Medin
Darko Medin
Data Scientist and a Biostatistician. Developer of ML/AI models. Researcher in the fields of Biology and Clinical Research. Helping companies with Digital products, Artificial intelligence, Machine Learning.
Statistician and a Data Scientist... I am going to explain why being both is incredibly powerful combination, especially today, but before that i need to explain a couple of myths and misunderstandings. There is big gap in understanding these two professions and differences between them. Some professionals actually claim, eg. being a Data Scientist, automatically by being a Statistician and vice versa. This is wrong and these are two different professions (even tough the overlap may exist, but not as large as many think). Even though there are similarities, such as both using data approaches, math and statistics, these two professions have huge difference set. Even if these professions are very complementary the skills between them can be transferable, still the differences can be huge and important to understand, especially for Team leaders.
As someone who is both a Statistician (Biostatistician) and Healthcare Data Scientist for around a decade, I will show the differences and nuances that Statisticians and Data Scientists have in their respective fields and why it’s the best to have both backgrounds, even though this is not easy.
Statistician's and Data Scientist's skillset may vary a lot. Let me show this by a RWE (Real World Evidence example). A Biostatistician is the key person to define the objectives a RWE project. Defining study design, variables, estimands and being a expert in study design and how it relates to both clinical and statistical methodologies as a key area for Biostatisticians. Using the right software, the right statistical methods, of course, we all know this is on the list. But Statisticians are experts in inferring/interpreting results from RWE into the most effective decision making workflow possible with that data. It actually takes years and years to master this. Statisticians are so important in such projects and whole teams results often depend on Statisticians theoretical and practical knowledge in applying Statistical methodology. On the other hand Data Scientists may often be involved in in Data part, but also in the ML/AI product creation in RWE field and deploying models using software engineering segments and communicating with software engineering. There approaches and skillsets are fundamentally different. Takes at least 5 years to master this (model deployment, typically done in Data Science, rarely in Statistics).
Data Scientist is a term which is used for a lot of different professions, but here i will discuss the full Stack Data Scientist, a professional with Feature engineering / Machine Learning / AI skills and creating Data Science products (Data Scientists do Data analysis too, the workflow is different). Software engineering may play a key role there and this is one of the main difference between Statisticians and Data Scientists. One good example is the model deployment. While in a Statistical RWE project a Statistician would mainly be interested in population / subgroup level estimation of the parameters and potentially decision making based on that, a Data Scientist may often be involved in creating a Machine Learning product with a deployed model, made to make individual subject predictions based on 100s or 1000s of covariates using aproaches such as XGBoost or Deep Learning.
领英推荐
By being both a Statistician and a Data Scientist, a professional can not only cover all these areas but also augment them exponentially by having both skillsets. Statisticians and Data Scientists are very well specialized for their specific fields and being both a Statistician and a Data Scientist is such powerful combination especially in an AI era that well live in today.
There are numerous situations where as a Biostatistician in a RWE or Clinical Trials i said to myself - Being a Data Scientist helps me a lot here, im fully confident in creating advanced models, i am fully confident in testing AI models in Healthcare using advanced methods in Statistics and working on digital Biostatistics products, databased and so on, just because i am good the Data Science software implementations and engineering.
On the other hand my project management, regulatory knowledge and statistical methodology ,project design skills in the Data Science field were so many times augmented by the rigor, regulatory and statistical methodology due to the fact that i am a Biostatistician. One example is the AI models. I would use my Data Science, Python programming skills to create the AI models, but in the end its evaluated using Biostatistics very often and the AUC, Sensitivity, Specificity RMSE, AIC and other statistical approaches.
There are also many projects in which i take both angles to make sure they complement each other, and i am rarely wrong there.
PS, while there is a another profession similar to what i described as a full stack Data Scientist here, Machine Learning engineers and MLops experts, these are all different too, but more on that in the next posts.
Thanks for sharing
Data Scientist and a Biostatistician. Developer of ML/AI models. Researcher in the fields of Biology and Clinical Research. Helping companies with Digital products, Artificial intelligence, Machine Learning.
8 个月Thank you all for your comments and suggestions. I may actually write a series on this topic as i think its very important to understand.
Statistician /Data Scientist/Machine Learning
8 个月Right......
Data Analyst | Data Entry | Clinical Statistical Programmer | CDISC Standards (SDTM, ADaM) | Data Wrangling | TFL Generation (Tables, Listings, Figures) | Data Visualization | R Programming | Python for Data Science
8 个月thanks for breaking it down
Biostatistician | | Data Scientist|| Mathematical Modelling|| Epidemiologist (better biostatistics, better clinical research) C3 NRF rated scientist
8 个月Nice breakdown and very helpful…????