Structure of Data Science Project!!!!
Pushpak Bakal
Analyst-consulting-ET&P: Finance & Performance @ Deloitte | Driving Financial Insights | Anaplan | Data Analytics
Step 1:- Understand the problem statement.?
Step 2:- Collection of Data Collect the data from various sources. It could be in an Excel file, MongoDB, or a SQL server. Generally, this is the task of the data engineering team. Data engineers can collect data from multiple devices.
Step 3:- Choose an IDE(Google Colab is recommended) and Connecting the Data
Connect with any of your databases (MongoDB or SQL Server or anything) using python. to start the Data Science part of the project, load the dataset in your Data Base. Name the file as data_base.py. Since the project needs some third-party library for the implementation of the project. And those third-party libraries need to be installed before the project starts. So, all those libraries needed are written in the file named requirements.txt. Hence a file name requirements.txt is created and all the required libraries are written. Then the same need to be installed using pip install -r requirements.txt.?
Or else just mount dataset on the Google Drive using the following code:-
from google.colab import drive
drive.mount('/content/gdrive')
Step 4:- Data preprocessing
In this phase,data scientists analyse the data collected for biases, patterns, ranges, and distribution of values. It is done to determine the sustainability of the databases and predicts their usage in regression, machine learning and deep learning algorithms. The phase also involves the introspection of different types of data, including nominal, numerical, and categorical data.?
Data visualization is also done to highlight the critical trends and patterns of data, comprehended by simple bars and line charts. Simply put, data processing might be the most time-consuming but arguably the most critical phase in the entire life cycle of data analytics. The goodness of the model depends on this data processing stage.
The most important part of this phase is the conversion of data into a different format, scrubbing and filtering of data, withdrawing and replacing values, and splitting, merging, and withdrawing columns. This phase is time-consuming but important, as it determines the reliability of the model.
Step 5:- EDA(Exploratory Data Analysis)
Data Analysis or Exploratory Data Analysis is a critical step in the data science lifecycle, where input determines output. Experts use data statistics methods such as mean and median to understand the data, and plot data and assess its distribution patterns.
The feature is used to extract features and test important variables, and data visualization is used to highlight trends and patterns. Correlation does not imply cause but can be used to deduce that changing the value of one column would affect the other.
Step 6:- Data Modelling
At this stage we will train, test and save the model. The performance metrices of the model will also be saved for evaluation purposes.?Modelling Data is the most important phase of data analysis and involves developing datasets, choosing mode types, and choosing algorithms. It is important to extract necessary insights from the prepared data. Modeling involves training models to differentiate, forecast, and group data to understand the logic.
领英推荐
Step 7:- Model evaluation
This technique of Evaluation helps us to know which algorithm best suits the given dataset for solving a particular problem. Likewise, in terms of Machine Learning it is called as Best Fit. It evaluates the performance of different Machine Learning models, based on the same input dataset. The method of evaluation focuses on accuracy of the model, in predicting the end outcomes.
There?are few methods that are used to evaluate?a model performance. They are
Classification Metrices
In order to evaluate the performance of a Machine Learning model, there are some Metrics to know its performance and are applied for Regression and Classification algorithms.?The different types of classification metrics are:?
Step 8:- Model Deployment
Model deployment is the final stage of data science, where machine learning models are integrated and coupled with products and applications. It involves the creation of a delivery mechanism to get the model out in the market among users or to another system. Machine learning models can be deployed as on-demand or batch prediction services, with optimized compute costs and local processing power
A list of Development Strategies are:-
Thanks for reading this(too much large)article till end!!!!!!!!!!
Intern @Applied Materials | M.S. Industrial Engineering @Texas A&M University
1 年Thank you Pushpak for a detailed step by step process! ??
.
1 年Nice content ??