Predictive Modeling

Predictive Modeling

What is Predictive Modeling?

Predictive modeling, a tool used in predictive analytics, refers to the process of using mathematical and computational methods to develop predictive models that examine current and historical datasets for underlying patterns and calculate the probability of an outcome. The predictive modeling process starts with data collection, then a statistical model is formulated, predictions are made, and the model is revised as new data becomes available.

Predictive modeling is generally categorized as either parametric or nonparametric models. Within these two camps are several different varieties of predictive analytics models, including Ordinary Least Squares, Generalized Linear Models, Logistic Regression, Random Forests, Decision Trees, Neural Networks, and Multivariate Adaptive Regression Splines.

Dr. Max Kuhn, Director of Non-Clinical Statistics at Pfizer Global R&D, and Dr. Kjell Johnson, co-founder of Arbor Analytics and former Director of Statistics at Pfizer Global R&D, published a popular and extensive text on the practice of predictive data modeling in their 2013 book Applied Predictive Modeling. Kuhn and Johnson provide intuitive explanations on the process of building, visualizing, testing, and comparing predictive modeling in R, a programming language and free software environment for statistical computing, graphics and data science.


What are Predictive Modeling Techniques?

In determining how to choose a predictive model, data scientists perform data sampling in order to analyze a representative subset of data points from which the appropriate predictive model can be developed. Some popular predictive modeling examples include:

  • Logistic regression: a statistical analysis method that predicts the parameters of a logistic model based on prior observations of a data set?
  • Decision trees: a flowchart-like tree structure in which each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node holds a class label
  • Time series analysis: refers to methods for illustrating and analyzing time series data in order to extract meaningful statistics


How to Make a Predictive Model

Regardless of the types of predictive models in place, the process of predictive model deployment follows the same steps:

  • Clean up data by treating missing data and eliminating outliers
  • Determine whether parametric or nonparametric predictive modeling is most effective
  • Reprocess the data into a format appropriate for the modeling algorithm
  • Specify a subset of data to be used for training the model
  • Train model parameters from the training dataset
  • Conduct predictive model performance monitoring tests to assess model efficacy
  • Validate predictive modeling accuracy on data not used for calibrating the model
  • Deploy the model for prediction


How to Evaluate a Predictive Model

A popular technique to employ in predictive model validation and evaluation is cross-validation. Datasets are split at random into training datasets, test datasets, and validation datasets. Training data is used to build the model, then the trained model is run against test data to evaluate performance, and the validation dataset ensures a neutral estimation of predictive model accuracy.?

Each time a subset of historical data is used as test data, remaining subsets are used as training data. As tests continue, a former test dataset will become one of the training datasets, and one of the former training datasets will become a test dataset, until every subset has been used as a test set. This allows the use of every data point in a historical dataset for both testing and training, which facilitates a less random and more effective, thorough method for evaluating data and testing model accuracy. See more on Big Data Analytics here.


What is Predictive Modeling Used For?

Predictive modeling, often associated with meteorology, is leveraged throughout a wide variety of disciplines. Some popular predictive modeling applications that utilize customer prediction models and CRM (Customer Relationship Management) predictive modeling include:?



Forecasting vs Predictive Modeling

Forecasting refers to the process of predicting future events based on analysis of trends and past and present data, whereas predictive modeling is based on probability and data mining. Forecasting pertains to out-of-sample observations, whereas prediction pertains to in-sample observations. Predicted values are calculated for observations in the sample used to estimate the regression. However, forecasting is made for the same dates beyond the data used to estimate the regression, so the data on the actual value of the forecasted variable are not in the sample used to estimate the regression.


Explanatory Modeling vs Predictive Modeling

Explanatory modeling refers to the application of statistical models to data for the purpose of testing causal hypotheses on theoretical constructs. The goal of explanatory modeling is to establish causal relationships by identifying variables that have a statistically and scientifically significant relationship with an outcome.

While predictive modeling addresses what might happen, explanatory modeling addresses what can be done about it, focusing on variables the user can control for the purposes of potential intervention. Explanatory modeling is the dominant statistical model in empirical research in Information Systems (IS) and typically relies on models in the generalized linear models (GLM) family, whereas predictive analytics models and methods rely on more powerful, algorithmic, non-linear techniques.

While prediction and explanation play different roles, both are vital in developing and testing theories.


Predictive Analytics vs Predictive Modeling

The terms “Predictive Modeling,” “Predictive Analytics,” and “Machine Learning” may sometimes be used interchangeably due to their largely overlapping fields and similar objectives, however there are some differentiating factors, such as practical applications. Data analytics predictive modeling is a tool leveraged in predictive analytics and is used throughout a range of industries, including meteorology, archaeology, automobile insurance, and algorithmic trading. When deployed commercially, predictive modeling is often referred to as predictive analytics.

要查看或添加评论,请登录

Darshika Srivastava的更多文章

  • LGD Model

    LGD Model

    Loss Given Default (LGD) models play a crucial role in credit risk measurement. These models estimate the potential…

  • CCAR ROLE

    CCAR ROLE

    What is the Opportunity? The CCAR and Capital Adequacy role will be responsible for supporting the company’s capital…

  • End User

    End User

    What Is End User? In product development, an end user (sometimes end-user)[a] is a person who ultimately uses or is…

  • METADATA

    METADATA

    WHAT IS METADATA? Often referred to as data that describes other data, metadata is structured reference data that helps…

  • SSL

    SSL

    What is SSL? SSL, or Secure Sockets Layer, is an encryption-based Internet security protocol. It was first developed by…

  • BLOATWARE

    BLOATWARE

    What is bloatware? How to identify and remove it Unwanted pre-installed software -- also known as bloatware -- has long…

  • Data Democratization

    Data Democratization

    What is Data Democratization? Unlocking the Power of Data Cultures For Businesses Data is a vital asset in today's…

  • Rooting

    Rooting

    What is Rooting? Rooting is the process by which users of Android devices can attain privileged control (known as root…

  • Data Strategy

    Data Strategy

    What is a Data Strategy? A data strategy is a long-term plan that defines the technology, processes, people, and rules…

  • Product

    Product

    What is the Definition of Product? Ask a few people that question, and their specific answers will vary, but they’ll…

社区洞察

其他会员也浏览了