OpenAI’s Transcription Tool Whisper Prone to Fabrications, Raises Concerns in Various Industries

SAN FRANCISCO — OpenAI’s highly-touted artificial intelligence-powered transcription tool, Whisper, has been found to generate fabricated text, including racial commentary and violent rhetoric, according to interviews with numerous software engineers, developers, and academic researchers. These fabrications, known as hallucinations, pose significant concerns as Whisper is extensively used across various industries worldwide for translation, transcription, and subtitling purposes. Of particular concern is the tool’s adoption in medical centers, despite OpenAI’s explicit warnings against using it in high-risk domains.

The extent of the problem is challenging to ascertain, but researchers and engineers have reported encountering Whisper’s hallucinations frequently. In a study conducted by a University of Michigan researcher, hallucinations were found in eight out of every ten audio transcriptions examined. Another machine learning engineer discovered hallucinations in approximately half of the over 100 hours of Whisper transcriptions analyzed. Additionally, a third developer reported finding hallucinations in nearly all of the 26,000 transcripts created with Whisper. Even well-recorded, short audio samples were not immune, as a recent study by computer scientists identified 187 hallucinations in over 13,000 clear audio snippets.

The prevalence of these fabrications has raised concerns among experts, advocates, and former OpenAI employees, who are urging the federal government to consider regulating AI. At the very least, they insist that OpenAI address the flaw promptly. OpenAI has acknowledged the issue and stated that it continually works to reduce hallucinations, incorporating feedback into model updates. However, engineers and researchers have noted that they have never encountered another AI-powered transcription tool that hallucinates to the same extent as Whisper.

Whisper’s integration into OpenAI’s flagship chatbot, ChatGPT, as well as its inclusion in Oracle and Microsoft’s cloud computing platforms, has made it widely accessible to thousands of companies worldwide. It is also employed for closed captioning, posing a particular risk to the Deaf and hard of hearing community, who may struggle to identify fabrications within the text.

Researchers have yet to determine the exact cause of Whisper’s hallucinations, but they tend to occur during pauses, background sounds, or when music is playing. OpenAI has explicitly advised against using Whisper in decision-making contexts due to its accuracy flaws, but hospitals and medical centers continue to utilize speech-to-text models, including Whisper, to transcribe doctor-patient consultations. Over 30,000 clinicians and 40 health systems have adopted a Whisper-based tool developed by Nabla, which has been fine-tuned for medical language. Nabla acknowledges the hallucination issue and claims to be mitigating it, but concerns remain regarding the erasure of original audio, potentially hindering error detection and verification.

The impact of AI-generated transcripts on patient-doctor interactions remains difficult to assess due to the confidential nature of these meetings. However, some individuals, such as California state lawmaker Rebecca Bauer-Kahan, have expressed concerns about the sharing of intimate medical conversations with tech companies. Bauer-Kahan refused to grant permission for her child’s consultation audio to be shared with vendors, including Microsoft Azure.

The prevalence of Whisper’s hallucinations underscores the need for increased scrutiny and regulation of AI technologies. OpenAI’s acknowledgment of the issue is a step in the right direction, but further action is necessary to ensure the tool’s reliability and prevent potential harm in critical domains.

___
This story was produced in partnership with the Pulitzer Center’s AI Accountability Network, which partially supported the academic Whisper study.

___
The Associated Press receives financial assistance from the Omidyar Network to support coverage of artificial intelligence and its impact on society. AP is solely responsible for all content. Find AP’s standards for working with philanthropies, a list of supporters, and funded coverage areas at AP.org.

Share this post

Let's Create free account on audie. No credit card required, give Author’s Voice a try!

Resend Verification Email

Enter your email to receive the verification code

[uv_resend_verification_form]