登录查看更多内容

XGBoost model for predicting mortality in chronic kidney disease and the importance of the top 10 features

Summer Rankin

AI Solution Architect @ Booz Allen Hamilton | MLOps, RAI

发布日期: 2022年8月2日

We recently published this article from a project with Assistant Secretary for Technology Policy , Booz Allen Hamilton , University of California, San Francisco on our models for predicting mortality for chronic kidney disease patients within 90 days of dialysis. The goal of this project was to develop a high quality training dataset and demonstrate some of the different types of models that can be created. The data was cleaned and organized using R and the XGBoost model was created in R. The code is at the github link below.

This dataset was obtained from USRDS and contained 188 features (predictors). I expected this number of rich features to give us a serious advantage, but to my surprise when we ran the XGBoost model with only the top 10 (most important according to XGBoost) features the c-statistic (AUC - area under the curve) was not much lower than the full model (c=0.78 vs. c=0.826). Another thing we tested was to have XGBoost natively handle the missing data (for continuous features) vs. creating multiple imputations (MICE). The results for these two models (AUC) were very similar (c=0.826, vs. imputed c=0.827). The clinicians were not as surprised by this, as they understand the clinical use case at a much deeper level than I ever will. As a data scientist, I thought the predictive power of these 10 features (for this dataset) was impressive and interesting. For details, see the article or the code, or send me any questions that you have.

Lucy Han , Rebecca Scherzer , Michelle Estrella, MD, MHS , Michael G. Shlipak

James Sanders

Data Scientist, Applied Machine Learning | ex-Wayfair, DataRobot, TXU | Co-founder

1 年

Thanks Summer! I noticed there was no discussion of data censoring, which is often a factor in survival analysis. Was data censoring a consideration?

查看更多评论

要查看或添加评论，请登录

XGBoost model for predicting mortality in chronic kidney disease and the importance of the top 10 features

Summer Rankin

AI Solution Architect @ Booz Allen Hamilton | MLOps, RAI

社区洞察

其他会员也浏览了

‘Data-driven health is going to be the biggest revolution in the history of medicine’

Effect of Norepinephrine on peripheral perfusion index and its association with the prognosis of patients with sepsis

Imagine having a tool at your fingertips that could potentially save your life.

Early detection can save lives. Carci Reagent home test for cancer.

Imagine being able to detect potential cancer indicators from the comfort of your home.

What if a simple test could offer you that peace of mind?

Did you know that in the Czech Republic, the percentage of men aged 50 to 69 who regularly go for preventative health checks........

How do you position yourself to create the biggest impact on age-related disease?

Carci Reagent: Test for cancer

The science of 1, 2 and 3