Health care systems struggle with each step of AI implementation, study finds.
When it comes to artificial intelligence, the hype, hope, and foreboding are suddenly everywhere. But the turbulent tech has long caused waves in health care: from IBM Watson’s failed foray into health care (and the long-held hope that AI tools may one day beat doctors at detecting cancer on medical images) to the realized problems of algorithmic racial biases.
But, behind the public fray of fanfare and failures, there’s a chaotic reality of rollouts that has largely gone untold. For years, health care systems and hospitals have grappled with inefficient and, in some cases, doomed attempts to adopt AI tools, according to a new study led by researchers at Duke University. The study, posted online as a pre-print, pulls back the curtain on these messy implementations while also mining for lessons learned. Amid the eye-opening revelations from 89 professionals involved in the rollouts at 11 health care organizations—including Duke Health, Mayo Clinic, and Kaiser Permanente—the authors assemble a practical framework that health systems can follow as they try to roll out new AI tools.
And new AI tools keep coming. Just last week, a study in JAMA Internal Medicine found that ChatGPT (version 3.5) decisively bested doctors at providing high-quality, empathetic answers to medical questions people posted on the subreddit r/AskDocs. The superior responses—as subjectively judged by a panel of three physicians with relevant medical expertise—suggest an AI chatbot such as ChatGPT could one day help doctors tackle the growing burden of responding to medical messages sent through online patient portals.
This is no small feat. The rise of patient messages is linked to high rates of physician burnout. According to the study authors, an effective AI chat tool could not only reduce this exhausting burden—offering relief to doctors and freeing them to direct their efforts elsewhere—but it could also reduce unnecessary office visits, boost patient adherence and compliance with medical guidance, and improve patient health outcomes overall. Moreover, better messaging responsiveness could improve patient equity by providing more online support for patients who are less likely to schedule appointments, such as those with mobility issues, work limitations, or fears of medical bills.
AI in reality
That all sounds great—like much of the promise of AI tools for health care. But there are some big limitations and caveats to the study that make the real potential for this application harder than it seems. For starters, the types of questions that people ask on a Reddit forum are not necessarily representative of the ones they would ask a doctor they know and (hopefully) trust. And the quality and types of answers volunteer physicians offer to random people on the Internet may not match those they give their own patients, with whom they have an established relationship.
But, even if the core results of the study held up in real doctor-patient interactions through real patient portal message systems, there are many other steps to take before a chatbot could reach its lofty goals, according to the revelations from the Duke-led preprint study.
To save time, the AI tool must be well-integrated into a health system’s clinical applications and each doctor’s established workflow. Clinicians would likely need reliable, potentially around-the-clock technical support in case of glitches. And doctors would need to establish a balance of trust in the tool—a balance such that they don’t blindly pass along AI-generated responses to patients without review but know they won’t need to spend so much time editing responses that it nullifies the tool’s usefulness.
And after managing all of that, a health system would have to establish an evidence base that the tool is working as hoped in their particular health system. That means they’d have to develop systems and metrics to follow outcomes, like physicians’ time management and patient equity, adherence, and health outcomes.
These are heavy asks in an already complicated and cumbersome health system. As the researchers of the preprint note in their introduction:
Drawing on the Swiss Cheese Model of Pandemic Defense, every layer of the healthcare AI ecosystem currently contains large holes that make the broad diffusion of poorly performing products inevitable.
The study identified an eight-point framework based on steps in an implementation when decisions are made, whether it’s from an executive, an IT leader, or a front-line clinician. The process involves: 1) identifying and prioritizing a problem; 2) identifying how AI could potentially help; 3) developing ways to assess an AI’s outcomes and successes; 4) figuring out how to integrate it into existing workflows; 5) validating the safety, efficacy, and equity of AI in the health care system before clinical use; 6) rolling out the AI tool with communication, training, and trust building; 7) monitoring; and 8) updating or decommissioning the tool as time goes on.
“An ongoing challenge”
Hospital systems have struggled at each of these steps, according to the responses from the 89 professionals and clinicians interviewed, which were anonymized in the study.
That includes even the first few steps of identifying problems that AI could help with. “Right now, a lot of the AI solutions are basically trying to do the same thing as a doctor. So, it’s like, take an X-ray, read it as a radiologist would. But we already have radiologists, so what is this thing doing?” one anonymous key source for an AI adoption said.
Assessing the effectiveness of an AI tool and whether it’s even appropriate for a given problem was also a common struggle. “I don’t think we even really have a great understanding of how to measure an algorithm’s performance, let alone its performance across different race and ethnic groups,” another source said.
But getting the algorithm right is just part of the challenge. Getting it to work for clinicians is another. Even relatively simple tools, such as AI-based methods to autocomplete triage notes in emergency departments have flopped in practice, according to interviewees. “When I first heard about that, I thought ‘this is a no brainer,’ like clinicians are gonna love having autocomplete as you understand it,” an interviewee said. But, “it’s not been as popular as you would expect. And it’s not because the algorithm is wrong. The algorithm is pretty spot on. But it doesn’t fit in their workflow.”
The technical build into a clinician’s workflow also has to be coupled with trust and understanding—and just the right amount of being right—anonymous sources said. As one interviewee explained, this makes uptake tricky:
If we have a situation in which the machine is basically all the time right, doctors are just going to trust it and stop focusing on it. If we have a system where the system is wrong, lots of the time doctors aren’t going to use it. If we have a system, on the other hand, where the system is wrong enough that doctors should be checking it a decent amount, and they find that they’re fixing it up a decent amount, it’s right there in that sweet spot. It’s hard for me to imagine that it’s staying there in the sweet spot, or frankly, that’s a good use of physician time.
In many instances, AI tools have fallen to the wayside amid staff turnover and a reluctance of clinicians to learn new tools when they’re barely keeping up with the work they already have. “This is an ongoing challenge,” one IT source said.
And when clinicians do adopt new tools, measuring and monitoring outcomes is a struggle. “I think most health systems are pretty shitty at figuring out how well it actually turns out in individual patient cases,” a key anonymous professional focused on regulation said. “And that’s part of the reason we don’t have anything close to a learning health system, because we’re not good at monitoring outcomes, except in a few weird, unusual cases.”
Altogether, the interview responses suggest that to truly harness the potential of AI in health care, health systems may need to create “new teams to interact with or monitor the system, new communication strategies to maintain professional boundaries, and new expertise.”