Following news that the Pfizer and BioNTech in an interim analysis, the world is starting to focus on the limited initial supplies of COVID vaccines. A new medRxiv preprint looks at over 500,000 Medicare beneficiaries to , which could help guide the necessary vaccine rationing.
鶹ý Editor-in-Chief and preprint co-author Marty Makary, MD, MPH, of Johns Hopkins University in Baltimore, speaks with Harlan Krumholz, MD, of Yale University, about the new data as well as the importance of sharing information like this in real-time via preprints during a pandemic.
Following is a transcript of their remarks:
Harlan Krumholz, MD: Hi! I'm Harlan Krumholz, a professor from Yale University. I'm here with Marty Makary from Hopkins. We're here to talk about a preprint that he's just put out. The title of this preprint is "Machine Learning Study of 534,023 Medicare Beneficiaries with COVID-19: Implications for Personalized Risk Prediction." I told Marty like, hey, I was really happy that he posted it on medRxiv. Before we even talk about this preprint, which I think has some pretty interesting implications, I've got some questions about what was the experience like for you posting it on medRxiv.
Marty Makary, MD: Amazing. Harlan, it's just incredible how we need something like medRxiv at a time of a pandemic. We cannot wait six months for the standard peer review process at a time when people have information that desperately needs to be shared. As long as it comes with the appropriate disclaimers, which Rxiv does very well, I think we can learn from each other. We don't just learn from formal randomized controlled trials. We learn from data that's shared in real-time with the appropriate limitations understood, so medRxiv has been a great disruptor of our very clunky and slow system that was never designed for a pandemic or health emergency. It was designed for peacetime, slow movement.
A protocol for ventilator management posted by some doctors at Mount Sinai Hospital and overnight gets adopted around the country. That's clinical wisdom and it makes sense. If that's what their experience is, why can't we learn from it? It's not all sort of level I RCTs versus snake oil. I think we can learn from experiences and we sort of had a system of medical journals paralyzed at a time when doctors are sharing their experiences on Twitter and social media, and posting things. I know you're involved in medRxiv, Harlan. Great work, and I'm a huge fan.
Harlan: Marty, thanks so much. Just for people who are listening, just to say, a preprint server is a place where people can post their studies prior to that peer review and publication. It's an opportunity to communicate the science and allow the community to engage in public dialogue about it, and to learn about what's going on. Recognizing that it hasn't been through peer review, it hasn't had a publication of record yet, but to enable it to go round. medRxiv, M-E-D-R-X-I-V is a preprint server for the clinical sciences that I was part of founding with Joe Ross and colleagues at The BMJ and Cold Spring Harbor Labs that would enable people within our community to be able to do it.
We want to post things, but we also want to be responsible. We don't review the science -- there is a heterogeneity of different kind of science on the platform -- but we do try to be responsible so that... and the disclaimers, as you mentioned, on the preprint, if it says, "Hasn't been peer-reviewed." We have a guidance for journalists how they should talk about something that's a preprint and mostly it's gone pretty well.
Look, I want to ask you a couple questions about this paper. First of all, over 500,000 observations, it's amazing. I know you got ahold of the CMS data, which is terrific, and in a fairly timely way. Can you tell us a little bit about what was the story behind this study? How did you come to do it and how did you come to get the opportunity to work with the data?
Marty: Harlan, we realized that the largest dataset of COVID patients is in the Medicare dataset and that this is something that there is no good reason why researchers can't access it. Seema Verma had made an announcement that she believes philosophically that it should be available for research, so we quickly followed up on the offer, did the largest study of COVID outcomes to date, and that is about half a million Medicare beneficiaries with the confirmed diagnosis code after April 1st right up until April 31st. Of those half a million cases, we also studied 38,000 inpatient deaths.
We found some interesting things. We learned that Hispanic patients were at the highest risk in terms of the race with the highest risk, at 74% increased risk of mortality after adjusting for comorbidities. We have the most comorbid country, probably, in the world, in the most contemporary time in the world. We have the most obese, the most comorbid, the most hospitalized, the most medicated, the most disabled population in the history of the world. People wonder why...
Harlan: As you've written in your books, you've written eloquently about this issue and how we need to address it.
Marty: It's one of the factors. There has been missteps, for sure, with the response to COVID, but we also have a very comorbid population. After adjusting for all those factors as best you can with claims data -- and you're the master of claims data research, you know these limitations -- it was still very much an independent risk factor. The Hispanic race, followed by Asian... you don't hear a lot about Asian race... with a 71% increased risk of mortality, followed by Black patients.
We also found some really interesting comorbidities that had not really been well defined or elucidated in prior studies. The #1 risk factor we identified in the Medicare population after adjusting for age and race is sickle cell disease, followed by chronic kidney disease -- something that had been described -- followed by leukemia and lymphomas, heart failure, diabetes, and lung cancer. It was interesting to identify a couple of novel risk factors in that hierarchy of things that had been described in different studies prior on a smaller scale.
Harlan: I noticed in this paper that you were using some machine learning techniques. What was it that made you decide to go in that direction with these data?
Marty: It seems like we've matured in analytic science from simply doing a univariate followed by a stepwise multivariate regression analysis, using good pretest hypothesis probability sort of testing in that design, and so we've got machine learning. What we did is we did the study both ways, with machine learning and the traditional multivariate regression analysis.
What we found is that when we use the factors identified with machine learning, it not only included those identified in the stepwise early process of the regression model, but it also found some additional risk factors. It was interesting.
We found that blindness, for example, was an independent risk factor, maybe because blindness tends to be associated with people with residential living, but independent of age. Alzheimer's was a predictor, cerebral palsy... so it was a novel way to look at a traditional analysis. We did it both ways and found that it was very helpful. We used the random forest model.
Harlan: You were able to look at both inpatients and outpatients.
Marty: Yeah.
Harlan: You talk about being able to use some of the results of your study to help guide prioritization around vaccines. Would you tell the...? I'm curious to hear a little bit more from you about that, how you're thinking about this. Obviously, these are claims. They are not quite the same as the kind of information you would get from talking to people, but is probably related. How can we take some of the work that you've done and think about it with regard to prioritizing vaccines?
Marty: I think the list of comorbidities that we've described in the medRxiv publication and in a forthcoming report now is going to help us think about which populations should be prioritized when it comes to limited resources, not just vaccines, but also therapeutics.
Look, we are already in a difficult position as physicians in trying to figure out how do we use a limited supply of polyclonal antibodies, remdesivir, and other things. These risk factors can be helpful in trying to figure out, of those who come in the door who are sick with COVID, who may be an ideal candidate for early therapeutic intervention because of their increased risk of mortality? Similarly, with the vaccine allocation, we know we're going to be supply constrained, so hopefully this information can inform some of that.
Also, what about the question of those who are totally healthy and have COVID? How many completely healthy Medicare beneficiaries have died of COVID? That's a question we've never gotten answered.
In our study, we identified 2,500 patients in the non-Medicare Advantage population who have been without comorbidities, using the chronic condition warehouse definition of comorbidities, and died of COVID? 2,500 Medicare beneficiaries out of the entire Medicare population. I think that can help us inform some of our messaging because we knew that mortality was skewed towards those with advanced age and comorbidities, but we didn't realize it was skewed this much.
Harlan: I was wondering whether or not -- I didn't see that and I might have missed it -- you looked at geographic differences or differences over time in how the risk has been changing. There has been some talk about the virus becoming less lethal over time, maybe because of improvements in the quality of our care or maybe because of some mutations. It's hard to know. Did you find any hints of that?
Marty: We did not. That's actually the next study we're doing right now. This study was in part funded by the West Health Institute and the next study is to look at change in mortality by age group over time. Our hypothesis is that we've seen about an 80% to 90% reduction in mortality just from getting better from what we've learned about less aggressive ventilator management, anticoagulation, convalescent plasma, or polyclonal and anticlonal antibody therapies, as they're about to be introduced, and other ways that we're identifying best practices with steroids and other things.
I think we're getting a lot better, but the moral hazard here in describing that finding that we're getting a lot better is that people might take it less seriously and be even more careless. We want people to continue to take it seriously, as you know.
Harlan: That's great. How about nursing homes? I wasn't sure whether you were able to take that into account also. Of course, that's likely in some ways maybe a marker for frailty. It's also about being in a congregate setting. Any insights about that?
Marty: It's pretty clear from the comorbidity list of risk factors that a bunch of them -- chronic pressure, ulcers, cerebral palsy, Alzheimer's, blindness -- that those are more common in those in a residential facility and that may be the surrogate because we didn't look at nursing home status before admission. But it's clear that is a risk factor.
Harlan: Overall, I thought... I'm just so glad you did this. The coordination with existing data that sits within CMS, it's a natural, and yet oftentimes there's data available around us that we're just not leveraging. I was so happy to see that you were able to jump in and also would do this work... it was independent, right? You basically were able to get the data, but as an academic, as someone who was just asking questions that might be able to inform practice and policy be able to address this as a scientist. I thought that was terrific and much appreciated. What are the next steps with this research?
Marty: I think we're going to be looking at those under age 65 in a separate analysis using 60% of the commercial data in the country available through FAIR Health, which has the largest commercial repository of claims data in the country. You're familiar with FAIR Health. I know we've talked about it in the past.
We're doing a similar study looking at risk factors and it's pretty clear that death among those with no comorbidities is a very small subgroup. Hopefully that information can be useful so we can make informed decisions about everyday life, schools and other things, where we don't want to say, "Look, there's no risk." We just want to say, "Let's define the risk. Let's measure it and rate it relative to other conditions like seasonal flu, viral meningitis, and bacterial pneumonia. Let's make informed decisions balancing the death toll from COVID with the death toll from the closures of schools, lockdowns, and other things. Let's just make a scientific assessment that's independent and free of politics." That's not an assessment we'll be making. We'll simply provide that data and hopefully inform that conversation.
Harlan: I think that's terrific. I think the other thing I was impressed with is that you were able to get pretty recent data. Oftentimes, as we know, the claims can take a year to come out. That's not very helpful in the midst of a pandemic. We talked about medRxiv as speeding communication. We also need to be able to build the platforms to be able to speed the data so that we can get the insights and then we can act on them in timely ways.
Look, I just wanted to thank you so much. I was really, like I said, thrilled to see this piece, both for the kind of insights that it provides. I was thrilled to see you use medRxiv as a way to be able to get it out for public dialogue.
People should know that when something like this shows at medRxiv there's places putting comments in on the medRxiv site. There is often a lot of discussion on Twitter. Make sure you use Marty's handle so that he can see, if you're going to comment or you've got suggestions.
The whole idea is it sits and exists for the community, so that it's not... I like to say there are those of us who helped launch this thing, but we should all feel part of it. We ask for affiliates, anyone who wants to screen in our community, you're welcome. Contact us and become part of the group working together on this. But, Marty, thank you so much for providing some visibility to it. Thanks for the opportunity to be on with you today.
Marty: Good to see you, Harlan. Take care.
Harlan: Thanks.