According to RNZ and Official Information Act documents released to RNZ, New Zealand police restarted a generative-AI transcription trial after officers used unapproved models. RNZ reports the police high-tech crime group has used the approved model Whisper for nine months since the restart. The documents say Whisper was found to be 45 percent inaccurate on Māori and Pacific languages and therefore could not be used for those languages. RNZ reports officers breached restrictions during the initial six-month trial, prompting police to halt the project in March last year and then restart it in September with tighter controls. RNZ also cites a Cornell University study that found Whisper hallucinated about 1 percent of the time. OpenAI told RNZ: "Addressing hallucinations is an ongoing area of research."
What happened
According to RNZ and Official Information Act documents released to RNZ, New Zealand police trialled generative AI for audio transcription and translation. RNZ reports officers tested Whisper for six months up to March last year, then d the project after documented misuse and rule breaches. RNZ reports police restarted the trial in September and have used the approved Whisper model for roughly nine months under additional controls. The documents say Whisper is 45 percent inaccurate on Māori and Pacific languages and therefore could not be used for those languages, per the RNZ reporting.
Technical details
RNZ cites an internal document that described the trial's misuse and limitations of relying on behavioural controls alone. RNZ also references a Cornell University study (May 2024) that found Whisper hallucinated about 1 percent of the time and occasionally fabricated racial commentary, violent rhetoric, or imagined medical treatments. OpenAI told RNZ: "Addressing hallucinations is an ongoing area of research," and a company spokesperson said, "speech recognition systems are not perfect and should be evaluated carefully for their intended use case." RNZ reports OpenAI has released newer variants such as GPT-Realtime-Whisper that it says improve accuracy and reduce hallucinations.
Editorial analysis
Industry context: Agencies trialling generative speech models commonly confront language and bias gaps, particularly for underrepresented languages and dialects. Observers following public-sector deployments note a recurring pattern where behavioural controls (policy and training) are insufficient without technical guardrails, independent evaluation, and clear evidentiary rules.
Context and significance
For practitioners, the RNZ report underscores two operational tensions: the productivity gains claimed by vendors versus measurable accuracy and hallucination risks in forensic settings. Public-sector use raises heightened stakes because transcription errors or hallucinations can affect investigations and community trust, especially where accuracy is poor for Indigenous and minority languages.
What to watch
- •Whether independent audits or language-specific benchmarks are published for Whisper variants and successors;
- •Any public police guidance or external oversight on using AI outputs in investigations and evidence pipelines;
- •Adoption of model-level mitigations (language-specific fine-tuning, confidence thresholds) versus solely behavioural controls.
For practitioners
Track reported error rates by language and independent evals before integrating speech models into high-stakes workflows; treat vendor claims of reduced hallucinations as a starting point for verification.
Scoring Rationale #
This story matters to practitioners because it documents real-world misuse and measurable accuracy failures of a widely used speech model in law-enforcement settings. It is notable for governance and operational risk, but does not introduce a new technical breakthrough.
Practice with real Telecom & ISP data
90 SQL & Python problems · 15 industry datasets
[Active Residential CustomersEasy](/problems/sql/active-residential-customers)
[Unlimited Fiber Plans 500Mbps+Medium](/problems/sql/unlimited-fiber-plans-above-500mbps)
[Customer Churn Risk AssessmentHard](/problems/sql/customer-churn-risk-assessment)
250 free problems · No credit card
See all Telecom & ISP problems