30-day readmission is an indicator used all over the world to measure quality of care and considered in some countries to be a better indicator to the performance of the #healthcare #system than to the health institutions (hospitals). AIM professor Franck JAOTOMBO addressed a research question: how to predict 30-day readmission from clinical text data and thus to gather more specific and detailed information on the patients at risk?

Franck Jaotombo and his team used Penalized Logistic Regression, Decision Trees Classifier, Random Forest, Gradient Boosting, LightGBM and Catboost methods to analyze the clinical discharge notes of the publicly available database MIMIC-III (Medical Information Mart for Intensive Care III). Performance is assessed with the Receiving Operator Characteristic Area Under the Curve (ROC AUC). Based on 42825 admissions with 2444 30-day readmissions (5.71%), Catboost is the highest performing model (AUC = 0.723). The feature importance of all the well performing models identify tokens and topics related to failure (renal, heart, respiratory), tracheostomy, treatment types and frequency as the main predictors of 30-day readmissions. The profile of the readmitted are mostly those insured under Medicaid and Medicare, single and old aged.

Applying text data to #Machine #Learning (ML) models brings more specific clinical complementary details to the predictors of readmission with an acceptable level of performance. The interpretability of the ML models is enhanced by the features’ importance and topic modelling. The choice of the vectorization is as important as the choice of the ML classifier in this process.


Franck JAOTOMBO. Predicting Hospital Readmission with Machine Learning using the MIMIC III discharge notes: the impact of Vectorization.