登录查看更多内容

Data Science Important Programming principles

Darko Medin

Data Scientist and a Biostatistician. Developer of ML/AI models. Researcher in the fields of Biology and Clinical Research. Helping companies with Digital products, Artificial intelligence, Machine Learning.

发布日期: 2022年6月16日

In Data Science, different programming languages play a key role. In fact most career Data Science ads will generally contain one of them or even o combination of multiple of them. Even the no-code Data Science platforms had to be coded first in order to work :). In this article, i am showing different principles that need to be applied to be efficient in Data Science, and in my opinion, this is almost an imperative for both present and future Data Scientists and Statistical Programmers.

First and very important thing is - being able to document the code efficiently. Every Data scientist that is considered a good data scientist needs to have a very good way of communicating the code to other Data Scientists (my opinion). Those #parts of the code are not irrelevant. They should contain all the relevant background behind the code, and what it actually does. Large scripts can become hard to navigate if Data Scientist don't leave notes in the code too so this is probably one of the most important principles in having clean and understandable code, its actually not always about the code itself.

In Data science being able to scale is everything. Writing blocks of code that can be easily reused, or scaled and used over many variables at the same time is the key for Advanced Data Science. In programming languages like R and Python, the language itself is ideally adapted for functions, loops and object oriented programming (OOP) which makes the scaling of the code easy to achieve (especially using the features such as Encapsulation in OOP where data and methods can be bound into single objects-very effective). SAS programming language has macros that enable similar features. Its interesting that one of the most used Data Science tools, yes Microsoft excel, also has coding and macros options, which is very important when working with large number of datasets/spreadsheets.

There is another fantastic advantage of OOP called Abstraction, and this principle is in my opinion essential especially when making AI/ML products. Being able to define parts of the algorithm that can change, while the rest of the algorithm is a safe non-changing environment is very important. This is another feature that is achieved with object oriented programming and storing functions and arguments in a right way (R and Python examples).

领英推荐

Seaborn: Elevating Data Visualization in Python

Shakil Khan 5 个月前

Top 10 Tools for data scientists in 2022

Huma Firdaus 2 年前

Machine Learning fast-track: Telco Customer Churn…

Neven Dujmovic 2 年前

Going beyond what's typically refereed to as object oriented programming, the data structures also enable us to store a lot of information about the models in easily callable objects, so using Python of R makes it really easy to scale up just by storing the data, arguments, functions, models/algorithms or whatever is needed to scale up as very simple and easily callable objects. SAS, SQL and many other programming languages enable this feature too.

Another important, yet not present in the overall discussions as much as it should be is the model deployment and re-training. These will not depend only on the model accuracy and explainabaility, but also about the effectiveness of the model itself. Models should be fast enough to be implementable in most software solutions, so its good to benchmark important blocks of code.

Simplify the code - one of the most important principles in creating optimized scripts. First version of the code should be considered for next rounds of code simplifications. Most often several rounds of code simplification can bring the code to the optimal level of complexity. This is essential not just for simplification purpose but most often, simpler code will tend to perform better too (of course in case they have all the functional and data capabilities as the complex ones)

Advanced Stats / Data Science

12,679 位关注者

要查看或添加评论，请登录

Darko Medin的更多文章

OncoNeo400 - New AI Confidence Interval feature

2025年3月25日

OncoNeo400 - New AI Confidence Interval feature

What's one of the main aspects that can bring a Statistical Advantage to an AI model? Improving individual predictions…
OncoNeo400 - A new Precision Oncology Research AI tool on BioAIWorks

2025年3月16日

OncoNeo400 - A new Precision Oncology Research AI tool on BioAIWorks

In this edition the OncoNeo400, novel Precision Oncology Research AI tool on BioAIWorks platform (bioaiworks.com).

7 条评论
LARVOL CLIN - New modules

2025年3月3日

LARVOL CLIN - New modules

This featuring article is about the new modules Larvol Pseudo-IPD and Larvol NMA on https://clin.larvol.

1 条评论
AI Developer tech skillsets.

2025年2月24日

AI Developer tech skillsets.

While these skills may vary according to the role, i will discuss the most significant ones that almost every AI…

2 条评论
Featuring article - the book : How To Be an Effective Statistician by Dr. Alexander Schacht

2025年2月16日

Featuring article - the book : How To Be an Effective Statistician by Dr. Alexander Schacht

The book How To Be an Effective Statistician: A Guide for Statisticians, Data Scientists, and Other Quantitative…

2 条评论
Causal Inference II Live - The ORIENTATION

2025年2月11日

Causal Inference II Live - The ORIENTATION

Causal Inference II is a Live Linkedin Event by Justin Bélair and Darko Medin . Here is the orientation on how and when…

9 条评论
Simulated and Synthetic Data Generation - Edition 1

2024年10月31日

Simulated and Synthetic Data Generation - Edition 1

The first in the series for Simulated and Synthetic Data Generation - by Darko Medin. Where to read :…
Simulated and Synthetic Data Series by Darko Medin - An ORIENTATION

2024年10月20日

Simulated and Synthetic Data Series by Darko Medin - An ORIENTATION

This is the orientation for my upcoming Series on Simulated and Synthetic Data. If you have any additional suggestions…

5 条评论
Simulated and Synthetic Data Generation - The Effective Statistician Workshop ORIENTATION - Lead by Darko Medin

2024年10月13日

Simulated and Synthetic Data Generation - The Effective Statistician Workshop ORIENTATION - Lead by Darko Medin

In today's data-driven world ability to generate Simulated and Synthetic data is one of the most important Data Science…
INTRODUCTION TO DEEP LEARNING

2024年10月3日

INTRODUCTION TO DEEP LEARNING

The INTRODUCTION TO DEEP LEARNING tutorial. Where to find? adatascience.

See all articles

Data Science Important Programming principles

Darko Medin

Data Scientist and a Biostatistician. Developer of ML/AI models. Researcher in the fields of Biology and Clinical Research. Helping companies with Digital products, Artificial intelligence, Machine Learning.

领英推荐

Advanced Stats / Data Science

12,679 位关注者

Darko Medin的更多文章

社区洞察

其他会员也浏览了

Machine Learning fast-track: Telco Customer Churn Prediction

Understanding the essential Data Processing libraries

Top 10 Tools for data scientists in 2022.

A Comprehensive Comparison of Programming and Query Languages for Data Analytics and Data Science Jobs

Essential Tools and Libraries for Data Science

A Data Science Framework: To Achieve 99% Accuracy using Python

What are Panda and NumPy in data analytics?

Mastering Python Data Cleaning Techniques: A Comprehensive Guide

Pandas vs. Numpy: What's the Vibe, Data Science Besties?

领英推荐

Advanced Stats / Data Science

12,679 位关注者

Darko Medin的更多文章

OncoNeo400 - New AI Confidence Interval feature

OncoNeo400 - A new Precision Oncology Research AI tool on BioAIWorks

LARVOL CLIN - New modules

AI Developer tech skillsets.

Featuring article - the book : How To Be an Effective Statistician by Dr. Alexander Schacht

Causal Inference II Live - The ORIENTATION

Simulated and Synthetic Data Generation - Edition 1

Simulated and Synthetic Data Series by Darko Medin - An ORIENTATION

Simulated and Synthetic Data Generation - The Effective Statistician Workshop ORIENTATION - Lead by Darko Medin

INTRODUCTION TO DEEP LEARNING

社区洞察

其他会员也浏览了

Machine Learning fast-track: Telco Customer Churn Prediction

Understanding the essential Data Processing libraries

Top 10 Tools for data scientists in 2022.

A Comprehensive Comparison of Programming and Query Languages for Data Analytics and Data Science Jobs

Essential Tools and Libraries for Data Science

A Data Science Framework: To Achieve 99% Accuracy using Python

What are Panda and NumPy in data analytics?

Mastering Python Data Cleaning Techniques: A Comprehensive Guide

Pandas vs. Numpy: What's the Vibe, Data Science Besties?