cd /news/generative-ai/police-restart-approved-ai-after-off… · home topics generative-ai article
[ARTICLE · art-30249] src=letsdatascience.com ↗ pub= topic=generative-ai verified=true sentiment=↓ negative

Police restart approved AI after officers used unapproved models

New Zealand police restarted a generative-AI transcription trial using OpenAI's Whisper model after officers breached restrictions during an initial six-month trial, prompting a halt in March 2024. The trial resumed in September 2024 with tighter controls, but documents reveal Whisper is 45% inaccurate on Māori and Pacific languages and hallucinates about 1% of the time, raising concerns about its use in law enforcement.

read3 min views1 publishedJun 16, 2026

According to RNZ and Official Information Act documents released to RNZ, New Zealand police restarted a generative-AI transcription trial after officers used unapproved models. RNZ reports the police high-tech crime group has used the approved model Whisper for nine months since the restart. The documents say Whisper was found to be 45 percent inaccurate on Māori and Pacific languages and therefore could not be used for those languages. RNZ reports officers breached restrictions during the initial six-month trial, prompting police to halt the project in March last year and then restart it in September with tighter controls. RNZ also cites a Cornell University study that found Whisper hallucinated about 1 percent of the time. OpenAI told RNZ: "Addressing hallucinations is an ongoing area of research."

What happened

According to RNZ and Official Information Act documents released to RNZ, New Zealand police trialled generative AI for audio transcription and translation. RNZ reports officers tested Whisper for six months up to March last year, then d the project after documented misuse and rule breaches. RNZ reports police restarted the trial in September and have used the approved Whisper model for roughly nine months under additional controls. The documents say Whisper is 45 percent inaccurate on Māori and Pacific languages and therefore could not be used for those languages, per the RNZ reporting.

Technical details

RNZ cites an internal document that described the trial's misuse and limitations of relying on behavioural controls alone. RNZ also references a Cornell University study (May 2024) that found Whisper hallucinated about 1 percent of the time and occasionally fabricated racial commentary, violent rhetoric, or imagined medical treatments. OpenAI told RNZ: "Addressing hallucinations is an ongoing area of research," and a company spokesperson said, "speech recognition systems are not perfect and should be evaluated carefully for their intended use case." RNZ reports OpenAI has released newer variants such as GPT-Realtime-Whisper that it says improve accuracy and reduce hallucinations.

Editorial analysis

Industry context: Agencies trialling generative speech models commonly confront language and bias gaps, particularly for underrepresented languages and dialects. Observers following public-sector deployments note a recurring pattern where behavioural controls (policy and training) are insufficient without technical guardrails, independent evaluation, and clear evidentiary rules.

Context and significance

For practitioners, the RNZ report underscores two operational tensions: the productivity gains claimed by vendors versus measurable accuracy and hallucination risks in forensic settings. Public-sector use raises heightened stakes because transcription errors or hallucinations can affect investigations and community trust, especially where accuracy is poor for Indigenous and minority languages.

What to watch

  • •Whether independent audits or language-specific benchmarks are published for Whisper variants and successors;
  • •Any public police guidance or external oversight on using AI outputs in investigations and evidence pipelines;
  • •Adoption of model-level mitigations (language-specific fine-tuning, confidence thresholds) versus solely behavioural controls.

For practitioners

Track reported error rates by language and independent evals before integrating speech models into high-stakes workflows; treat vendor claims of reduced hallucinations as a starting point for verification.

Scoring Rationale #

This story matters to practitioners because it documents real-world misuse and measurable accuracy failures of a widely used speech model in law-enforcement settings. It is notable for governance and operational risk, but does not introduce a new technical breakthrough.

Practice with real Telecom & ISP data

90 SQL & Python problems · 15 industry datasets

[Active Residential CustomersEasy](/problems/sql/active-residential-customers)

[Unlimited Fiber Plans 500Mbps+Medium](/problems/sql/unlimited-fiber-plans-above-500mbps)

[Customer Churn Risk AssessmentHard](/problems/sql/customer-churn-risk-assessment)

250 free problems · No credit card

See all Telecom & ISP problems

── more in #generative-ai 4 stories · sorted by recency
── more on @new zealand police 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/police-restart-appro…] indexed:0 read:3min 2026-06-16 ·