11:16
2026-07-03
the-decoder.com
large-language-models
GPT and Claude failed Bridgewater's finance tests because the right answers were never public
Bridgewater Associates and Thinking Machines Lab fine-tuned a Qwen3-235B model for financial tasks, achieving 84.7% accuracy and outperforming GPT, Claude, and Gemini at roughly one-fourteenth the cosβ¦