登录查看更多内容

How to predict hospital acquired pressure injuries

Syed Imad Husain

Business Intelligence | Analytics | Data Science | I am the Teller you need for thy Data has stories to tell

发布日期: 2019年6月17日

Hospital acquired Pressure Injuries (HPIs) are on the rise in US. This report details different statistical modeling techniques to predict if a patient will acquire HPI along with its severity. We discuss the complete analytical framework to implement Logistic Regression (Predicting HPI) & Multinomial Regression(Predicting Severity). We also discuss the intricacies of the data that would be required and the collection process. A comparative study of the suggested model against Braden Scale in terms of predictive power can be an indicator of reliability of the suggested model in terms of usability.

Problem Statement

“How would you go about predicting whether a patient would acquire a pressure injury (aka pressure ulcer, bed sore) during their hospital stay? How would you predict the severity of the injury? What data would you utilize and what techniques would you apply?”

This problem statement can be broken down into four specific parts –

How would you Predict Bed Sore during Hospital Stay?
How would you predict the severity of Bed Sore?
What data would be required?
What techniques would you apply?

Background

Hospital acquired Bed Sores or Pressure Injuries (HPI) are on the rise. Around 1.2 Million cases of (HPIs) occurred in 2015[1]. However, most of the times, HPIs can be avoided by means of proper medical care and attention. The most widely accepted evidence-based tool to tackle this problem is the Braden Scale[2]. The scale allocates equal weights to Sensory perception, Moisture, Activity, Mobility, Nutrition and Friction & Shear. Based on these factors, it calculates a score which is classified as risk of HPIs. However, it has been observed that the scale has poor predictive power. This article revolves around developing a methodology based on Statistics and Applied Data Mining techniques to achieve a similar purpose as the Braden Scale.

Important Factors

Based on my research, I have shortlisted potential factors of HPIs which will be used in our analysis -

Data set Considerations

The data set comprises of the variables mentioned in the previous section. Since HPI can’t occur within a day and that it may require 3 to 6 days, following considerations were made -

Grain of data is Patient & Week
Records can be differentiated from any other record based on unique combinations of Patient and Week
All data is gathering from similar setups to nullify the effect of medical service & infrastructure
Data gathering starts as soon as a patient is admitted
Data must be gathered by experienced nurses since classifying factors into classes is subjective
Data gathering stops when the patient has left the hospital, or it contracts HPI
Records are filtered out if
The stay is less than a week
The patient already suffers from PI
The grain of data is at Patient & Week level hence we may not directly be able to feed it to the model. I thought of 3 ways to deal with this problem-
Average out records by Patients such that each patient has 1 row
Remove Patient ID from the data and consider all observations
Consider the last observation for each patient, I chose this method because response is binary and cannot be averaged & considering all observations may result in auto-correlation since this data is being gathered over time and is not strictly cross-sectional
Additionally, third Party vendors like LexisNexis & Experian Health can be used for demographic data enrichment
All Categorical variables will require Dummy Encoding

Analyses Techniques

Predicting HPI – Logistic Regression

The simplest way to predict HPI is using Logistic Regression with hypothesis as

g(E[Y|X]) = Xβ

g() = monotonic, differentiable link function like Logit, Probit, CC-Log, etc.
E[Y|X] = Conditional Expectation of the response Y given the predictors X
Y = HPI
X = Predictor Matrix, set of all predictors i.e. variables 1 through 13
Β = Coefficient Matrix

Before we jump into model building, level of significance for hypothesis testing must be defined. We assume α = 0.05. Various steps in Model building are –

Exploratory Data Analysis

I would create a scatter-plot Matrix among all variables to study pairwise correlation and develop intuitions about the data. In this step, we also tackle the problem of multi-col-linearity. If there is a high correlation between two predictors, we will only consider the one with higher uni variate statistical significance.

Data set Split

Data set would be split into 80% training vs 20% testing. All model building activities will be performed only on training set & testing set will be used for validation. Since we have demographic information, we may perform stratified sampling

Variable Selection

There are multiple ways of selecting variables. Generally, in this step we built multiple models to compare in the subsequent steps. Some of the commonly used Variable selection techniques are –

Uni variate Significance Model - Combined model based on significance of individually significant variables. Each variable is considered one at a time and its statistical significance is determined using the p-value rule
Step-wise selection – This requires using Step algorithm with directions ‘Forward’, ‘Backward’ or ‘Both’. One variable at a time is added to the model and its statistics is compared to the previous model. Any model comparison criteria can be used like AIC, BIC, etc. Final model is the model with best value for the criteria
Best Subset – The previous method is not exhaustive and hence does not assure the best possible model. However, using Best Subset algorithm results in an exhaustive grid search of the n-dimensional parameter space for the best possible model, where n is the number of predictors. However, this is computationally very expensive
Shrinkage Method – Another method is using Lasso or Ridge method with regularization cost penalty added to the overall model. These methods are preferred when number of parameters are greater than number of observations. It results in a Sparse model

In our case, I would build a model using each technique based on computational feasibility and efficiency

Model Comparison

Final models from the previous steps will be compared to select the best model. There are many ways for Model Comparison like AIC, BIC, Adjusted R2 , Out-of-sample MSE, etc. In our case, I will calculate out-of-sample mis-classification Rate (Total mis-classifications / Total observations, on testing data set) and select the model with the least value. ROC Curve and AUC values can also be used as a criterion for model comparison.

Asymmetric Cost & P-Cutoff (Grid Search Method)

The output of Logistic regression is the values of coefficients => Pi = [1+e(-Xβ)]-1

To make predictions, we must calculate the P-Cutoff such that when

Pi <= P Cutoff then Y = 0
Pi > P Cutoff then Y = 1

Since we are aware that the cost of mis-classifying a Positive as Negative is far greater than classifying the Negative as Positive, we will have to use an Asymmetric cost function which penalizes the former more than it does the latter. Using this cost function, we can determine the P-Cutoff as

Define a sequence of Probability values. For example, 100 values between 0 and 1 with steps of 0.01
Calculate asymmetric cost with each P value
Visualize association between sequence of P values (x axis) and cost (y axis) also called as Elbow - Plot
Choose the P value with minimum cost

Model Prediction

Once we determine the Cutoff Probability, we can make predictions using the logistic regression equation described in the previous section

Predicting Severity – Multinomial Regression

All steps performed above were for the binary response HPI. To predict Severity which is a multinomial response, we can use the following methods –

Multinomial Logistic Regression with Ordinal Response

Assume that a latent variable Z ~ Logistic(μ= -βX, σ2 = 1) is defined such that when

Z < c1 then Y = 0
Z < c2, then Y= 1
Z < c3, then Y= 2
Z < c4, then Y= 3

Then, P{Y <= 0} = P { Z <= c1 } = P { logit(μ,1) <= c1 } = P { logistic(0,1) <= c1 - μ } = F(c1 + βX) where,

P{} represents probability
F is the Inverse Logit function
ci is the model intercept for the ith class of the response variable

Similarly, Probabilities for all classes can be calculated. This type of analysis can be performed in R using the VGAM::vglm() function. The example above demonstrates the working of the Logit Link. Other link functions can also be employed in a similar fashion.

Multinomial Logistic Regression with Nominal Response

Although we are aware of the ordinality in the levels of our Response (Severity), another way to model this response is assuming that the distribution of response is multinomial => P{Y = i } = Pi {where i is between 0 & 3} . By definition, ?(Pi) = 1. We choose one class as the baseline and interpret the log odds of all other classes as

Log(Pi/P0) = αi + βiX where

i varies between 1 & 3
P0 represents baseline probability
αi represents intercept for ith class
βi represents the coefficient matrix for ith class

This type of analysis can be performed in R using nnet::multinom() function. As in the previous case, different link functions can be applied.

These are the approaches for modeling the multinomial response Severity. All other steps remain the same as in the case of binary response HPI.

Alternative Approaches

Apart from the simple approaches elaborated above, there are few more methods which can be used interchangeably for modeling HPI & Severity. For instance,

K-Means Clustering
Hierarchical Clustering
Classification Tree
Random Forest
Probabilistic Structural Equation Model – Bayesian Belief Network

However, these methods may result in loss of model interpret-ability.

Note- Although these are proficient & well-established alternative approaches, I have deliberately not elaborated them given the scope of this article

Appendix

https://www.americannursetoday.com/wp-content/uploads/2018/05/DabirSupplement_May2018.pdf
https://en.wikipedia.org/wiki/Braden_Scale_for_Predicting_Pressure_Ulcer_Risk

How to predict hospital acquired pressure injuries

Syed Imad Husain

Business Intelligence | Analytics | Data Science | I am the Teller you need for thy Data has stories to tell

Problem Statement

Background

Important Factors

Data set Considerations

Analyses Techniques

Predicting HPI – Logistic Regression

Exploratory Data Analysis

Data set Split

Variable Selection

Model Comparison

Asymmetric Cost & P-Cutoff (Grid Search Method)

Model Prediction

Predicting Severity – Multinomial Regression

Multinomial Logistic Regression with Ordinal Response

Multinomial Logistic Regression with Nominal Response

Alternative Approaches

Appendix

更多精彩文章

社区洞察

其他会员也浏览了

The Dilemma of Physician Outcome Orientation and the Importance of Root Cause Appreciation - The Importance of the Definitive Diagnosis in Medicine

DEXT CAPITAL QUARTERLY INDUSTRY UPDATE Q322

4,900,000 people and a $60,000,000,000 opportunity

Understanding Needle Phobia: Common Traits Among Patients

The Power of LLMs to Improve the Patient Experience

Patient Profiles Healthcare Analysis

3 Ways Predictive Analytics Are Changing Wound Care Patient Interactions

Unlocking the Power of Patient Similarity Networks in Hospitals with Graph Theory

A Precise Approach to the Business of Healthcare

Aligning Clinical Intensity with Patient Needs: A New Approach in Digital Health

Problem Statement

Background

Important Factors

Data set Considerations

Analyses Techniques

Predicting HPI – Logistic Regression

Exploratory Data Analysis

Data set Split

Variable Selection

Model Comparison

Asymmetric Cost & P-Cutoff (Grid Search Method)

Model Prediction

Predicting Severity – Multinomial Regression

Multinomial Logistic Regression with Ordinal Response

Multinomial Logistic Regression with Nominal Response

Alternative Approaches

Appendix

Top 3 challenges posed by Big Data

2019年6月16日

The Mystery behind January 1st, 1753

2019年5月19日

社区洞察

其他会员也浏览了

The Dilemma of Physician Outcome Orientation and the Importance of Root Cause Appreciation - The Importance of the Definitive Diagnosis in Medicine

DEXT CAPITAL QUARTERLY INDUSTRY UPDATE Q322

4,900,000 people and a $60,000,000,000 opportunity

Understanding Needle Phobia: Common Traits Among Patients

The Power of LLMs to Improve the Patient Experience

Patient Profiles Healthcare Analysis

3 Ways Predictive Analytics Are Changing Wound Care Patient Interactions

Unlocking the Power of Patient Similarity Networks in Hospitals with Graph Theory

A Precise Approach to the Business of Healthcare

Aligning Clinical Intensity with Patient Needs: A New Approach in Digital Health