Gradient Boosting To Predict Hospital Length Of Stay

Gradient Boosting To Predict Hospital Length Of Stay

We Are Still In The Same Context Of Prediction Where We Are Trying To Vanish Using Narrow Spreadsheets Capabilities That Limit Our Power In Only "Descriptive Analytics", And Enjoy The Power Comparing Different Machine Learning Algorithms Computationally Versus Linear Regression "Since Python Programming Has Made The Issue Relatively Simple And Easy".

You Do Not Have To Be Pioneer In Math And Stat To Start Wrangling Your Custom Data Sets (At Least For Now) i.e I Am Talking Here To My Friend From Healthcare Background.

Before We Upload Our Data To Google Colaboratory, Let"s First Define What is Gradient Boosting? And What Is The Difference Between Gradient Boosting And Gradient Descent.

Gradient Boosting?is a technique for building an?Ensemble?of Weak Models Such That the Predictions of the Ensemble Minimize a loss function.

So What Is Ensemble?

Ensemble Modeling is? Process where multiple diverse base models are used to predict an outcome.

While Gradient descent?is an algorithm for finding a set of parameters that optimizes a loss function. Given a loss function.

We Can Summarize It In 5 steps as follow

  • Step 1 : Make the first guess.
  • Step 2 : Compute the pseudo-residuals.
  • Step 3 : Predict the pseudo-residuals.
  • Step 4 : Make a prediction and compute the residuals.
  • Step 5 : Make a second prediction.

No alt text provided for this image

I Will Add A Video From YouTube In The First Comment For More Explanation Of The Concept, Watch It, Then Dive With Me In The Python Code.

Upload The Data To Google Colaboratory

Load Dependencies Using This Snippet


import?numpy?as?np
import?pandas?as?pd
import?seaborn?as?sns
from?sklearn.model_selection?import?train_test_split
from?sklearn.preprocessing?import?StandardScaler
from?sklearn.linear_model?import?LinearRegression
from?sklearn.ensemble?import??GradientBoostingRegressor
        

Load Our Dummy Data, Or Use Your Custom Data From Your Electronic Medical Records And Using This Snippet, If The Data Is Real, Use Offline JUPYTER Notebooks For Confidentiality.


df?=?pd.read_csv('/content/Healthcare_Investments_and_Hospital_Stay?(1).csv')
df.head(7)        
No alt text provided for this image

draw Heat Map Using This Snippet "Seaborn Python Library"

No alt text provided for this image

One Hot Encoding + Train\Test Split


def?onehot_encode(df,?column):
????df?=?df.copy()
????dummies?=?pd.get_dummies(df[column])
????df?=?pd.concat([df,?dummies],?axis=1)
????df?=?df.drop(column,?axis=1)
????return?df        


def?preprocess_inputs(df)
????df?=?df.copy()
????
????#?One-hot?encode?Location?column
????df?=?onehot_encode(df,?column='Location')
????
????#?Split?df?into?X?and?y
????y?=?df['Hospital_Stay'].copy()
????X?=?df.drop('Hospital_Stay',?axis=1).copy()
????
????#?Train-test?split
????X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?train_size=0.7,?random_state=123)
????
????#?Scale?X?with?a?standard?scaler
????scaler?=?StandardScaler()
????scaler.fit(X_train)
????
????X_train?=?pd.DataFrame(scaler.transform(X_train),?columns=X.columns)
????X_test?=?pd.DataFrame(scaler.transform(X_test),?columns=X.columns)
????
????return?X_train,?X_test,?y_train,?y_test

X_train,?X_test,?y_train,?y_test?=?preprocess_inputs(df)
        

Build Linear Regression Model and Print The R squared

R squared?is a number between 0 and 1 and measures the degree to which changes in the dependent variable can be estimated by changes in the independent variable(s).

No alt text provided for this image

it is 0.85

Lets Build The Gradient Boosting With This Few Lines With SKleaarn, The Library Documentation Link Is In The Second Comment.

No alt text provided for this image

Here The R Squared is 0.93 Which Means That It Performs Better In Prediction If Compared To Linear Regression on This Data.

In The Next "Article" We Will Discuss Usage Of Natural Language Processing NLP In Medical Records And See If We Can Use It In prediction Of Unplanned Readmission Within 3o Days From Discharge.

Google Colaboratory Notebook Is In The Third Comment.

An Awesome Kaggle Kernel and some Other Resources Are Also Attached In The Comments.








Dr.Mostafa Samy, BDS

Dentist | Healthcare Data Science / A.i | Generative A.i for healthcare "clinical LLM's" | Digital Healthcare Transformation & Integration | MSc Operations Research | Applied Statistics(FGSSR) | DTQM( AICPD).

3 年
回复
Dr.Mostafa Samy, BDS

Dentist | Healthcare Data Science / A.i | Generative A.i for healthcare "clinical LLM's" | Digital Healthcare Transformation & Integration | MSc Operations Research | Applied Statistics(FGSSR) | DTQM( AICPD).

3 年
回复
Dr.Mostafa Samy, BDS

Dentist | Healthcare Data Science / A.i | Generative A.i for healthcare "clinical LLM's" | Digital Healthcare Transformation & Integration | MSc Operations Research | Applied Statistics(FGSSR) | DTQM( AICPD).

3 年
回复
Dr.Mostafa Samy, BDS

Dentist | Healthcare Data Science / A.i | Generative A.i for healthcare "clinical LLM's" | Digital Healthcare Transformation & Integration | MSc Operations Research | Applied Statistics(FGSSR) | DTQM( AICPD).

3 年
回复
Dr.Mostafa Samy, BDS

Dentist | Healthcare Data Science / A.i | Generative A.i for healthcare "clinical LLM's" | Digital Healthcare Transformation & Integration | MSc Operations Research | Applied Statistics(FGSSR) | DTQM( AICPD).

3 年
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了