Predictive Modeling
Darshika Srivastava
Associate Project Manager @ HuQuo | MBA,Amity Business School
What is Predictive Modeling?
Predictive modeling, a tool used in predictive analytics, refers to the process of using mathematical and computational methods to develop predictive models that examine current and historical datasets for underlying patterns and calculate the probability of an outcome. The predictive modeling process starts with data collection, then a statistical model is formulated, predictions are made, and the model is revised as new data becomes available.
Predictive modeling is generally categorized as either parametric or nonparametric models. Within these two camps are several different varieties of predictive analytics models, including Ordinary Least Squares, Generalized Linear Models, Logistic Regression, Random Forests, Decision Trees, Neural Networks, and Multivariate Adaptive Regression Splines.
Dr. Max Kuhn, Director of Non-Clinical Statistics at Pfizer Global R&D, and Dr. Kjell Johnson, co-founder of Arbor Analytics and former Director of Statistics at Pfizer Global R&D, published a popular and extensive text on the practice of predictive data modeling in their 2013 book Applied Predictive Modeling. Kuhn and Johnson provide intuitive explanations on the process of building, visualizing, testing, and comparing predictive modeling in R, a programming language and free software environment for statistical computing, graphics and data science.
What are Predictive Modeling Techniques?
In determining how to choose a predictive model, data scientists perform data sampling in order to analyze a representative subset of data points from which the appropriate predictive model can be developed. Some popular predictive modeling examples include:
How to Make a Predictive Model
Regardless of the types of predictive models in place, the process of predictive model deployment follows the same steps:
How to Evaluate a Predictive Model
A popular technique to employ in predictive model validation and evaluation is cross-validation. Datasets are split at random into training datasets, test datasets, and validation datasets. Training data is used to build the model, then the trained model is run against test data to evaluate performance, and the validation dataset ensures a neutral estimation of predictive model accuracy.?
Each time a subset of historical data is used as test data, remaining subsets are used as training data. As tests continue, a former test dataset will become one of the training datasets, and one of the former training datasets will become a test dataset, until every subset has been used as a test set. This allows the use of every data point in a historical dataset for both testing and training, which facilitates a less random and more effective, thorough method for evaluating data and testing model accuracy. See more on Big Data Analytics here.
领英推荐
What is Predictive Modeling Used For?
Predictive modeling, often associated with meteorology, is leveraged throughout a wide variety of disciplines. Some popular predictive modeling applications that utilize customer prediction models and CRM (Customer Relationship Management) predictive modeling include:?
Forecasting vs Predictive Modeling
Forecasting refers to the process of predicting future events based on analysis of trends and past and present data, whereas predictive modeling is based on probability and data mining. Forecasting pertains to out-of-sample observations, whereas prediction pertains to in-sample observations. Predicted values are calculated for observations in the sample used to estimate the regression. However, forecasting is made for the same dates beyond the data used to estimate the regression, so the data on the actual value of the forecasted variable are not in the sample used to estimate the regression.
Explanatory Modeling vs Predictive Modeling
Explanatory modeling refers to the application of statistical models to data for the purpose of testing causal hypotheses on theoretical constructs. The goal of explanatory modeling is to establish causal relationships by identifying variables that have a statistically and scientifically significant relationship with an outcome.
While predictive modeling addresses what might happen, explanatory modeling addresses what can be done about it, focusing on variables the user can control for the purposes of potential intervention. Explanatory modeling is the dominant statistical model in empirical research in Information Systems (IS) and typically relies on models in the generalized linear models (GLM) family, whereas predictive analytics models and methods rely on more powerful, algorithmic, non-linear techniques.
While prediction and explanation play different roles, both are vital in developing and testing theories.
Predictive Analytics vs Predictive Modeling
The terms “Predictive Modeling,” “Predictive Analytics,” and “Machine Learning” may sometimes be used interchangeably due to their largely overlapping fields and similar objectives, however there are some differentiating factors, such as practical applications. Data analytics predictive modeling is a tool leveraged in predictive analytics and is used throughout a range of industries, including meteorology, archaeology, automobile insurance, and algorithmic trading. When deployed commercially, predictive modeling is often referred to as predictive analytics.