Machine Learning Model Accurately Identifies High-Risk Surgical Patients

— New tool more accurate than existing risk calculator

by Sophie Putka, Enterprise & Investigative Writer, �鶹��ý July 13, 2023

A close up photo of a hospital patients interlocked hands and IV port taped to their wrist.

A machine learning model trained, tested, and evaluated with data from 1,477,561 patients was accurate at identifying those who were at high risk for mortality 30 days after surgery, outperforming the most popular current presurgical risk calculator tool, a prognostic study found.

The area under the receiver operating characteristic curve (AUROC) for 30-day mortality was 0.972, 0.946, and 0.956 for the training, test, and prospective set, respectively. The AUROC was 0.923 and 0.899 for 30-day mortality or 30-day major adverse cardiac or cerebrovascular events (MACCEs) on training and test sets, respectively, reported Aman Mahajan, MD, PhD, of the University of Pittsburgh School of Medicine, and co-authors, in .

"They [surgeons] can actually get what the actual patient's overall risk is, so they can have much better decision-making on whether the surgery might be successful, what the actual outcome for this patient might be as well," Mahajan told �鶹��ý, "and thereby also allow better shared decision-making between the surgeon and patient, and the other consultants."

When comparing the new machine learning model with the National Surgical Quality Improvement Program (NSQIP) surgical risk calculator, a tool developed by the American College of Surgeons (ACS) across 393 institutions that use manual data entry, AUROC scores were 0.945 (95% CI 0.914-0.977) vs 0.897 (95% CI 0.854-0.941), for a difference of 0.048.

To better interpret the predictions of risk, the most important features with respect to their log odds of the outcome of interest (in this case, mortality alone and MACCE or mortality) were identified using Shapley Additive Explanations (SHAP) feature attribution values. Higher SHAP values corresponded with the feature's contribution to predicting one of these events.

Mahajan and co-authors reported that the age on contact date was associated with the greatest change in 30-day MACCE or mortality, with older patients at higher risk. Lower recent albumin levels were a significant factor in just mortality, but not the MACCE or mortality outcome.

After heart disease and stroke, the third leading contributor to global deaths is postoperative death . But, Mahajan and co-authors said, this deadly and costly contributor has few predictive tools to allow hospitals to identify patients at high risk and adjust care accordingly. "We've been using this model now close to 3 years, and it's continued to be accurate," Mahajan said. The tools that are available, like the NSQIP surgical risk calculator, can lose accuracy across unique operations, patients, institutions and regions.

This machine learning model offers advantages to other commonly used risk prediction tools in a clinical setting, Mahajan said. He noted that while some models are accurate during training and testing, they can deteriorate as populations or practices change. The application used in clinical practice for the tool updated predictions automatically every 24 hours, without having to manually extract EHR data.

Mahajan noted that the team's use of a large and diverse patient population and the inclusion of many social determinants of health contributed to the model's accuracy and robustness. "Many of the models in the past actually don't use that feature," he said, "You can envision that two people can have diabetes, but one of them actually has a different socioeconomic status, education status, lifestyle choices, and their outcomes are likely to be different."

Richard Li, MD, a radiation oncologist from City of Hope National Medical Center in Duarte, California, who was not involved with the study, has also used machine learning models to predict mortality risk for medical outcomes. "They [the researchers] actually deployed the machine learning model into actual clinical practice, which is a pretty huge accomplishment, especially the part where you're actually using it [in] day-to-day clinical practice," Li told �鶹��ý. "I think it is very, very non-trivial to do."

Li said he and his team introduced the application of SHAP values to medical problems, and noted that Mahajan and colleagues "actually use that pretty intelligently in this paper, so it helps them explain the predictions of the model better ... because if you see a patient with high mortality risk, you want to know why."

The study used data from electronic health records [EHRs] of patients from 20 hospitals in the University of Pittsburgh Medical Center (UPMC) health system. Overall, 54.5% of participants were female, and mean age was 56.8 years. The main outcomes were postoperative mortality 30 days after surgery, and MACCE or mortality at 30 days after surgery.

To train the model, the team used data from 1,016,966 randomly selected unique surgical procedures between December 2012 and May 2019 that had a prior physician visit at a UPMC office.

To prospectively test or validate the model's accuracy, the researchers used a randomly selected set of 254,242 unique patients scheduled for surgery between June 2019 to May 2020, and deployed the tool with blinded clinicians. They then clinically deployed the model among another 206,353 patients scheduled in the same time period, and clinicians could see mortality risk scores in an application (high, medium, low) prior to surgeries or at referral to perioperative care at UPMC.

To compare the new model's accuracy with the older NSQIP predictive tool, the researchers used a random selection of 902 patients scheduled for surgery between April and June 2021.

Surgeries included those that used any anesthesiology service. MACCEs were defined as one or more of the ICD-10 codes for acute type 1 or type 2 myocardial infarction, cardiogenic shock or acute heart failure, unstable angina, and stroke.

Some of the 368 variables the model used to predict risk included: demographics, medical history diagnosis codes, medications, laboratory and test values, social determinants of care, and socioeconomic status. Most common diagnoses from office visits 60 days before surgery, most common primary and secondary procedures 1 year prior to surgery, most common pharmaceutical classes and medications prescribed 180 days prior to surgery, and most common specialty physician visits 60 days prior to surgery were used as independent variables.

Study limitations, the investigators said, included the dependence on data already in the EHR, the fact that the data came only from the UPMC EHR system (although some medical records are shared between UPMC and other centers), and a lack of validation using tests sets from other institutions.

Sophie Putka is an enterprise and investigative writer for �鶹��ý. Her work has appeared in the Wall Street Journal, Discover, Business Insider, Inverse, Cannabis Wire, and more. She joined �鶹��ý in August of 2021.

Disclosures

Mahajan reported no disclosures; a co-author reported receiving grants from the NIH outside of the study and serving as founder and chief medical officer of OpalGenix and being a consultant for NeurOptics.

Primary Source

JAMA Network Open

Mahajan A, et al "Development and validation of a machine learning model to identify patients before surgery at high risk for postoperative adverse events" JAMA Netw Open 2023; DOI: 10. 1001/jamanetworkopen.2023.22285.