Extreme Gradient Boosting XGBoost To Predict Hospital Length Of Stay

Extreme Gradient Boosting XGBoost To Predict Hospital Length Of Stay

We Are Still In The Same Context Of Prediction And We Are Trying To Find A Way For Hospital Length Of Stay Prediction To Properly Optimize Our Resources.

Here, We Are Not Trying To Clarify Mathematically And Statistically How This Algorithms Work, However We Are Trying To Get The Attention Of Healthcare Decision Makers That These Algorithms Implementation Has Now Been Easy Using Few Lines Of Code , Owing To Python Libraries "You Could Use R too". "Assuming That You Have Some Statistical Knowledge eg: Regression

XGBoost is one of the most popular machine learning algorithms. Regardless of the type of prediction task at? regression or classification, But Here We Are Mainly Concerned By It's Usage As A Regressor Rather Than Classifier.

XGBoost (Extreme Gradient Boosting) belongs to a family of boosting algorithms and uses the gradient boosting (GBM) framework at its core. It is an optimized distributed gradient boosting (Source:DataCamp).

Lets See How This Is Done Very Easily Using SkLearn Python Library And Compare The Accuracy To Our Conventionally Used Model "Linear Regression",,, I Assume You Studied Linear Regression In Six Sigma Courses :).

Upload The Dummy Data Set "Or Your Real Data" To Either Drive Or Google Colaboratory NoteBook.

Load Dependencies As Usual


!pip?install?catboost
!pip?install?ipywidgets
!jupyter?nbextension?enable?--py?widgetsnbextension        



import?numpy?as?np
import?pandas?as?pd
import?seaborn?as?sns
import?matplotlib.pyplot?as?plt


from?sklearn.model_selection?import?train_test_split
from?sklearn.preprocessing?import?StandardScaler


from?sklearn.linear_model?import?LinearRegression
from?sklearn.neighbors?import?KNeighborsRegressor
from?sklearn.neural_network?import?MLPRegressor
from?sklearn.svm?import?LinearSVR,?SVR
from?sklearn.tree?import?DecisionTreeRegressor
from?sklearn.ensemble?import?RandomForestRegressor,?GradientBoostingRegressor
from?xgboost?import?XGBRegressor
from?lightgbm?import?LGBMRegressor
from?catboost?import?CatBoostRegressor
        

Read The Data With Pandas


df?=?pd.read_excel('/content/sample_data_Healthcare_Investments_and_Hospital_Stay.xlsx')        


df.head(5)        
No alt text provided for this image


sns.heatmap(df.corr(),?annot=?True)        
No alt text provided for this image

Run The Following Snippets successively


def?onehot_encode(df,?column):
????df?=?df.copy()
????dummies?=?pd.get_dummies(df[column])
????df?=?pd.concat([df,?dummies],?axis=1)
????df?=?df.drop(column,?axis=1)
????return?df
        


def?preprocess_inputs(df):
????df?=?df.copy()
????
????#?One-hot?encode?Location?column
????df?=?onehot_encode(df,?column='Location')
????
????#?Split?df?into?X?and?y
????y?=?df['Hospital_Stay'].copy()
????X?=?df.drop('Hospital_Stay',?axis=1).copy()
????
????#?Train-test?split
????X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?train_size=0.7,?random_state=123)
????
????#?Scale?X?with?a?standard?scaler
????scaler?=?StandardScaler()
????scaler.fit(X_train)
????
????X_train?=?pd.DataFrame(scaler.transform(X_train),?columns=X.columns)
????X_test?=?pd.DataFrame(scaler.transform(X_test),?columns=X.columns)
????
????return?X_train,?X_test,?y_train,?y_test

X_train,?X_test,?y_train,?y_test?=?preprocess_inputs(df)
        

The Model For Linear Regression, And As You Are Able To See ,, We Defined Our Data For Train And Test And We Only Used One Word "LinearRegression()"


models?=?
????"?????????????????????Linear?Regression":?LinearRegression(),
????}


for?name,?model?in?models.items():
????model.fit(X_train,?y_train)
????print(name?+?"?trained.")        

This Snippet To Check And Print The Accuracy


for?name,?model?in?models.items():
????print(name?+?"?R^2?Score:?{:.5f}".format(model.score(X_test,?y_test)))        

Lets See The R^2 Score

No alt text provided for this image

Lets Move To The XGBoost Code Snippet , It Is The Same As ThAt Of The Linear Regression, I Only Changed One Word "From LinearRegression() To XGBRegressor().

It Can't Be Easier Than That :)

Lets See The Code


models?={



?
????"?????????????????????XGBoost":?XGBRegressor(),
????}


for?name,?model?in?models.items():
????model.fit(X_train,?y_train)
????print(name?+?"?trained."){        

Lets See The R^2 Score

No alt text provided for this image

mmmmmm :) :) :) ,, Much Higher ,,, Great .

To summarize ,, We Have To Move From Using Conventional Spreadsheets That Limit Our Capability Of Achieving More Accurate Predictions If We Would Like To Dive More And More Deep Inside Quantitative Managerial Methods, 100% This Will Positively Impact Our Decision Especially In Healthcare Facilities Where The Usage Of Such Techs Is Very Limited.

Dr.Mostafa Samy, BDS

Dentist | Healthcare Data Science / A.i | Generative A.i for healthcare "clinical LLM's" | Digital Healthcare Transformation & Integration | MSc Operations Research | Applied Statistics(FGSSR) | DTQM( AICPD).

3 年
回复

要查看或添加评论,请登录

社区洞察