登录查看更多内容

Gradient Boosting To Predict Hospital Length Of Stay

Dr.Mostafa Samy, BDS

Dentist | Healthcare Data Science / A.i | Generative A.i for healthcare "clinical LLM's" | Digital Healthcare Transformation & Integration | MSc Operations Research | Applied Statistics(FGSSR) | DTQM( AICPD).

发布日期: 2021年9月6日

We Are Still In The Same Context Of Prediction Where We Are Trying To Vanish Using Narrow Spreadsheets Capabilities That Limit Our Power In Only "Descriptive Analytics", And Enjoy The Power Comparing Different Machine Learning Algorithms Computationally Versus Linear Regression "Since Python Programming Has Made The Issue Relatively Simple And Easy".

You Do Not Have To Be Pioneer In Math And Stat To Start Wrangling Your Custom Data Sets (At Least For Now) i.e I Am Talking Here To My Friend From Healthcare Background.

Before We Upload Our Data To Google Colaboratory, Let"s First Define What is Gradient Boosting? And What Is The Difference Between Gradient Boosting And Gradient Descent.

Gradient Boosting?is a technique for building an?Ensemble?of Weak Models Such That the Predictions of the Ensemble Minimize a loss function.

So What Is Ensemble?

Ensemble Modeling is? Process where multiple diverse base models are used to predict an outcome.

While Gradient descent?is an algorithm for finding a set of parameters that optimizes a loss function. Given a loss function.

We Can Summarize It In 5 steps as follow

Step 1 : Make the first guess.
Step 2 : Compute the pseudo-residuals.
Step 3 : Predict the pseudo-residuals.
Step 4 : Make a prediction and compute the residuals.
Step 5 : Make a second prediction.

I Will Add A Video From YouTube In The First Comment For More Explanation Of The Concept, Watch It, Then Dive With Me In The Python Code.

Upload The Data To Google Colaboratory

Load Dependencies Using This Snippet

import?numpy?as?np
import?pandas?as?pd
import?seaborn?as?sns
from?sklearn.model_selection?import?train_test_split
from?sklearn.preprocessing?import?StandardScaler
from?sklearn.linear_model?import?LinearRegression
from?sklearn.ensemble?import??GradientBoostingRegressor

Load Our Dummy Data, Or Use Your Custom Data From Your Electronic Medical Records And Using This Snippet, If The Data Is Real, Use Offline JUPYTER Notebooks For Confidentiality.

df?=?pd.read_csv('/content/Healthcare_Investments_and_Hospital_Stay?(1).csv')
df.head(7)

draw Heat Map Using This Snippet "Seaborn Python Library"

One Hot Encoding + Train\Test Split

Towards Data Science 6 个月前

Is there any library on C++ like Sklearn, NumPy…

Brecht Corbeel 1 年前

Mastering XGBoost: From Basics to Advanced Techniques…

Nick Gupta 1 年前

def?onehot_encode(df,?column):
????df?=?df.copy()
????dummies?=?pd.get_dummies(df[column])
????df?=?pd.concat([df,?dummies],?axis=1)
????df?=?df.drop(column,?axis=1)
????return?df

def?preprocess_inputs(df)
????df?=?df.copy()
????
????#?One-hot?encode?Location?column
????df?=?onehot_encode(df,?column='Location')
????
????#?Split?df?into?X?and?y
????y?=?df['Hospital_Stay'].copy()
????X?=?df.drop('Hospital_Stay',?axis=1).copy()
????
????#?Train-test?split
????X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?train_size=0.7,?random_state=123)
????
????#?Scale?X?with?a?standard?scaler
????scaler?=?StandardScaler()
????scaler.fit(X_train)
????
????X_train?=?pd.DataFrame(scaler.transform(X_train),?columns=X.columns)
????X_test?=?pd.DataFrame(scaler.transform(X_test),?columns=X.columns)
????
????return?X_train,?X_test,?y_train,?y_test

X_train,?X_test,?y_train,?y_test?=?preprocess_inputs(df)

Build Linear Regression Model and Print The R squared

R squared?is a number between 0 and 1 and measures the degree to which changes in the dependent variable can be estimated by changes in the independent variable(s).

it is 0.85

Lets Build The Gradient Boosting With This Few Lines With SKleaarn, The Library Documentation Link Is In The Second Comment.

Here The R Squared is 0.93 Which Means That It Performs Better In Prediction If Compared To Linear Regression on This Data.

In The Next "Article" We Will Discuss Usage Of Natural Language Processing NLP In Medical Records And See If We Can Use It In prediction Of Unplanned Readmission Within 3o Days From Discharge.

Google Colaboratory Notebook Is In The Third Comment.

An Awesome Kaggle Kernel and some Other Resources Are Also Attached In The Comments.