Health-ISAC Hacking Healthcare 11-7-2024

November 8, 2024 | Hacking Healthcare

This week, Health-ISAC®‘s Hacking Healthcare® examines research suggesting that a well-known artificial intelligence (AI) transcription model that is being used as the underlying tool in healthcare products may be worryingly prone to “hallucinating” words or even entire sentences. This week, we assess what the research has found and provide some general considerations for healthcare organizations eager to take advantage of AI capabilities.

As a reminder, this is the public version of the Hacking Healthcare blog. For additional in-depth analysis and opinion, become a member of Health-ISAC and receive the TLP Amber version of this blog (available in the Member Portal.)

2024.11.7 TLPWHITE Hacking Healthcare
Size : 196.7 kB Format : PDF

Text Version:

Welcome back to Hacking Healthcare®.

Hallucinating AI Tool Prompts Healthcare Concern AI developers and policymakers routinely tout the transformative capabilities of AI in sectors like healthcare. For example, within healthcare delivery organizations, AI has the potential to aid in enhancing medical imagery analysis, ease patient scheduling, or more effectively route IT help desk requests. However, recent news articles[i] [ii] have reiterated reasons why organizations should be careful when adopting emerging technologies that may not be as safe, secure, or reliable as they claim or are assumed to be.

Nabla’s Healthcare AI Assistant

The ability to reduce administrative burdens so that healthcare providers can spend more time focusing on the patient and caregiving is an understandably attractive quality for an AI tool. One such product advertised as providing just that is an “AI assistant” produced by Nabla. Nabla claims their AI assistant is capable of “pre-charting, medical codification, clinical decision prompting,” and more.[iii] It would appear that this suite of capabilities has been very well received, as Nabla’s website suggests that their product is already deployed in over 85 health organizations and is being used by more than 45,000 clinicians.[iv] One of the highlights of Nabla’s AI assistant is its ability to transcribe clinician-patient interactions into appropriate clinical notes with a high degree of accuracy.[v] Accuracy is obviously critical in this context, given how inaccurate transcriptions may risk significant patient harm. For example, failing to accurately capture a patient’s allergen history or their current medicine regimen and dosage could lead to the wrong healthcare decisions down the road. It is this aspect of the tool that has come under scrutiny in recent weeks due to a study that has called into question the accuracy of the underlying AI model that Nabla’s tool is based on.

OpenAI and Tool Development

If you were ever curious how so many AI and AI-enabled products have been able to come to market so quickly despite the relative complexity and newness of AI, part of the answer is the use of existing tools as a basis upon which other companies can build something more specialized or complex. This is the case with Nabla’s AI assistant, which employs OpenAI’s Whisper[vi] as the underlying tool.[vii] For those unfamiliar, OpenAI’s Whisper is described as an automatic speech recognition (ASR) system that is described as “[approaching] human level robustness and accuracy on English speech recognition.”[viii] So what’s the issue?

Whisper’s Accuracy Woes

According to recent research, OpenAI’s Whisper is more error prone than might be appreciated.[ix] While you may be thinking, “it’s only natural that something like a name might be misspelled or that a heavy accent might slightly skew a transcription,” the errors reported were a bit more concerning. It has been reported that Whisper is “prone to making up chunks of text or even entire sentences” and that a University of Michigan researcher found “hallucinations in eight out of every 10 audio transcriptions” that they had reviewed.[x] Other users of Whisper supported this finding, with one claiming that they had found “hallucinations in nearly every one of the 26,000 transcripts he created with Whisper.”[xi]

You can see how this may be concerning to users of Nabla’s AI assistant. However, things are not quite as straightforward as intuiting that Nabla’s product is inherently flawed or subject to the same concerning hallucination issues.

Adapting Whisper & Nabla’s Response

Models like Whisper are designed to be built upon. Without getting too technical, they can be trained on new sources of data and adjustments can be made to numerous variables, such as how they weigh certain aspects or interpret instructions. In essence, they can be fine-tuned to better specialize in a particular task or subject matter.

According to Nabla, the limitations of Whisper were known to them and it is why they say they spent several years and millions of dollars to “[gather] and manually [annotate] a unique dataset of 7,000 hours of medical encounters audio” to better refine it.[xii] Furthermore, Nabla claims there are additional improvements and safeguards to “suppress” hallucinations and limit the potential for inaccuracies to make it onto a patient’s record.[xiii]

In the Action & Analysis section below, we will provide some high-level takeaways for Health-ISAC members on how to think about employing AI tools, as well as some considerations specific to Nabla’s case.

Action & Analysis
Included with Health-ISAC Membership

[i]https://apnews.com/article/ai-artificial-intelligence-health-business-90020cdf5fa16c79ca2e5b6c4c9bbb14# [ii] https://www.wired.com/story/hospitals-ai-transcription-tools-hallucination/ [iii] https://www.nabla.com/ [iv] https://www.nabla.com/ [v] Nabla’s marketing refers to “95% note accuracy” in relation to “15 seconds note generation” [vi] https://openai.com/index/whisper/ [vii] https://www.nabla.com/blog/how-nabla-uses-whisper/ [viii] https://openai.com/index/whisper/ [ix]https://apnews.com/article/ai-artificial-intelligence-health-business-90020cdf5fa16c79ca2e5b6c4c9bbb14# [x]https://apnews.com/article/ai-artificial-intelligence-health-business-90020cdf5fa16c79ca2e5b6c4c9bbb14# [xi]https://apnews.com/article/ai-artificial-intelligence-health-business-90020cdf5fa16c79ca2e5b6c4c9bbb14# [xii] https://www.nabla.com/blog/how-nabla-uses-whisper/ [xiii] https://www.nabla.com/blog/how-nabla-uses-whisper/ [xiv] https://www.nabla.com/blog/how-nabla-uses-whisper/ [xv] https://www.nabla.com/blog/assessing-reliability-nabla-speech-to-text/ [xvi] https://www.nabla.com/blog/how-nabla-uses-whisper/

Health-ISAC Hacking Healthcare 11-7-2024

As a reminder, this is the public version of the Hacking Healthcare blog. For additional in-depth analysis and opinion, become a member of Health-ISAC and receive the TLP Amber version of this blog (available in the Member Portal.)

2024.11.7 TLPWHITE Hacking Healthcare
Size : 196.7 kB Format : PDF

Welcome back to Hacking Healthcare®.

Nabla’s Healthcare AI Assistant

OpenAI and Tool Development

Whisper’s Accuracy Woes

Adapting Whisper & Nabla’s Response

Action & Analysis
Included with Health-ISAC Membership

Strengthen your organization AND the resiliency of the global health sector

Membership

Sponsorship

Health-ISAC Hacking Healthcare 11-7-2024

As a reminder, this is the public version of the Hacking Healthcare blog. For additional in-depth analysis and opinion, become a member of Health-ISAC and receive the TLP Amber version of this blog (available in the Member Portal.)

2024.11.7 TLPWHITE Hacking Healthcare Size : 196.7 kB Format : PDF

Welcome back to Hacking Healthcare®.

Nabla’s Healthcare AI Assistant

OpenAI and Tool Development

Whisper’s Accuracy Woes

Adapting Whisper & Nabla’s Response

Action & Analysis **Included with Health-ISAC Membership**

Strengthen your organization AND the resiliency of the global health sector

Membership

Sponsorship

2024.11.7 TLPWHITE Hacking Healthcare
Size : 196.7 kB Format : PDF

Action & Analysis
Included with Health-ISAC Membership