登录查看更多内容

Data Science Life Cycle

Dhatchana Moorthi

Data Science & Engineering | Linkedln Top Voice ( Community )

发布日期: 2023年9月28日

+ 关注

The life cycle of data science contains the following steps:

Understating the Business problem
Preparing the data
Exploratory Data Analysis (EDA)
Modeling the data
Evaluating the model
Deploying the model

1. Understanding the Business problem

The "why" question served as the catalyst for many world advances.

Every good business or IT-focused life cycle begins with "why," and the same is true for good data science life cycles. The business objective must be clearly understood because it will be the analysis's end result.

A crucial aspect of the early stages of data analytics is to look at business trends, develop case studies of related data analytics in other businesses, and conduct market research on the business's industry. These duties are regularly undertaken by stakeholders at this early stage of data analytics. All members of the team evaluate the internal infrastructure, internal resources, the total amount of time required to complete the project, and the technological requirements for the project. Once all of these analyses and evaluations have been completed, the stakeholders begin developing the primary hypothesis on how to resolve all business difficulties based on the current market condition after all of the preliminary analyses and evaluations have been completed.

In short, to define the business problem for the data science project following are the essential points to remember.

List the issue that needs to be resolved.?
Define the project's potential value.
Determine the project's risks, taking ethical issues into account.?
Create and distribute a flexible, high-level project plan.

2. Preparing the data

The second phase of the data science life cycle is data preparation. This is to prepare the data to understand the business problem and extract information to solve the problem.

Selecting data related to the problem.
Combining the data sets, you may integrate the data.?
Clean the data to find the missing values.
Handle the missing values by removing or imputing them.?
Errors are dealt with by being removed.

Use the box plots for detecting outliers and handling them

3. Exploratory Data Analysis (EDA)

Before really developing the model, this step entails understanding the solution and the variables that may affect it. To understand data and data features better, we create heat maps, bar graphs, and charting.

We need to keep a few factors in mind when analyzing the data, including checking that the data is accurate and free of duplicates, missing values, and even null values. Additionally, when working on model construction, we need to be sure that we recognize the crucial factors in the data set and eliminate any extraneous noise that can really reduce the accuracy of our conclusions.

70% of the data science project life cycle time is spent on this step. We can extract lots of information with the proper EDA

4. Modeling the data

This is the most important step of the life cycle of data science. This tells a lot about a data science project. This phase is about selecting the right model type, depending on whether the issue is classification, regression, or clustering. Following the selection of the model family, we must carefully select and implement algorithms to be used inside that family.

360DigiTMG 7 个月前

Data Science Approaches to Data Quality: From Raw Data…

Yair R. 1 年前

Uncover Insights using Exploratory Data Analysis (EDA)

Techcanvass 3 个月前

There are numerous hyper parameters. Therefore, we should determine the model's ideal hyper parameter values. We don't want to over fit. So hyper parameter tuning is important in model building. This hyper parameter tuning makes the model predict correctly.

5. Evaluating the model

We built a model in the previous phase. But isn't our model effective? Therefore, we must determine our model's existing status to improve it.

To evaluate the model to understand the model works better. There are two techniques used widely to assess the performance of the model. They are Hold-Out and Cross-Validation used in data science to evaluate models.

Holdout evaluation is the process of testing a model with data that is distinct from the data it was trained on. This offers a frank assessment of learning effectiveness.

Cross-validation is the process of splitting the data into sets and using them to analyze the performance of the data. In the cross-validation procedure, the initial observation data set is divided into two sets: a training set for the model's training and an independent set for the analyses' evaluation. Both approaches use a test set (unseen by the model) to assess model performance in order to prevent over-fitting.?

If the evaluation does not yield a satisfying outcome, we must repeat the modeling procedure in its entirety until the necessary level of metrics is attained.

Metrics that are used to evaluate the models are:

Classification models:?
Accuracy?
ROC-AUC
Precision-Recall o Log-Loss
Regression models:
MSAE
MSPE
R Square
Adjusted R Square?
Unsupervised Models:
Mutual Information
Rand Index

With the help of this step, we choose the right model for our business problem. Based on this step, we create the model best suits our needs.

6. Deploying the model

We have reached the end of our life cycle. In this step, the delivery method that will be used to distribute the model to users or another system is created.

For various projects, this step can mean many different things. Getting your model results in a Tableau dashboard might be all that is necessary. or as complicated as growing it to millions of users on the cloud.

Any shortcuts used during the minimally viable model phase are updated to systems fit for production. This phase is typically carried out by team members who are more "engineering-focused," such as data engineers, cloud engineers, machine learning engineers, application developers, and quality assurance engineers.

The many phases of the data science life cycle should be carefully considered. The entire effort is wasted if any step is carried out incorrectly because it will have an impact on the following phase.

For instance, improper data collection will result in information loss and a model that is not perfect. The model won't function effectively if the data is not adequately cleaned. The model will fall short in the real world if it is not adequately examined. Each phase, from business comprehension to model deployment, should receive the appropriate consideration, time, and effort.

Thank You for reading.

CloudStakes Technology Pvt. Ltd.

1 年

Thank you for sharing your expertise!

要查看或添加评论，请登录

查看全部

Data Science Life Cycle

Dhatchana Moorthi

Data Science & Engineering | Linkedln Top Voice ( Community )

1. Understanding the Business problem

2. Preparing the data

3. Exploratory Data Analysis (EDA)

4. Modeling the data

领英推荐

5. Evaluating the model

6. Deploying the model

更多精彩文章

社区洞察

其他会员也浏览了

The Data Science Lifecycle

Unmasking Real-World Data Science: A Departure from Kaggle’s Accuracy Frenzy and Model-Centric Approaches

8 Steps In Data Science Process Decoded – 4th One Is Amazing

Mastering Data Science [Concepts and Practices]

Navigating the Data Science Lifecycle: From Problem Definition to Model Deployment

Leveraging Data Science for Strategic Business Analysis

The Data Science Lifecycle

Data science meets Interpretation: A Blog idea around Data Science and Interpretation

Data Science for Business

The one reason exploratory data analysis doesn't live up to the expectations.

1. Understanding the Business problem

2. Preparing the data

3. Exploratory Data Analysis (EDA)

4. Modeling the data

领英推荐

5. Evaluating the model

6. Deploying the model

Top 9 Best Practices When Writing SQL

2023年10月10日

What is data engineering?

2023年10月8日

Top Data Science Trends of 2023

2023年10月7日

How to Choose the Best Programming Language for your Data Science Project

2023年10月6日

Dashboard Reporting

2023年10月5日

Data Engineer, Data Analyst, Data Scientist — What’s the Difference?

2023年10月4日

How to Handle Imbalanced Classes in Machine Learning

2023年10月1日

12 Useful Data Analysis Methods

2023年9月30日

7 Elements of a Data Strategy

2023年9月29日

社区洞察

其他会员也浏览了

The Data Science Lifecycle

Unmasking Real-World Data Science: A Departure from Kaggle’s Accuracy Frenzy and Model-Centric Approaches

8 Steps In Data Science Process Decoded – 4th One Is Amazing

Mastering Data Science [Concepts and Practices]

Navigating the Data Science Lifecycle: From Problem Definition to Model Deployment

Leveraging Data Science for Strategic Business Analysis

The Data Science Lifecycle

Data science meets Interpretation: A Blog idea around Data Science and Interpretation

Data Science for Business

The one reason exploratory data analysis doesn't live up to the expectations.