GPT and Claude failed Bridgewater's finance tests because the right answers were never public

Bridgewater Associates and Thinking Machines Lab fine-tuned a Qwen3-235B model for financial tasks, achieving 84.7% accuracy and outperforming GPT, Claude, and Gemini at roughly one-fourteenth the cost. The results have not been independently verified.

Bridgewater and Thinking Machines Lab—the startup from former OpenAI CTO Mira Murati—have fine-tuned a Qwen3-235B model for financial tasks. According to their own testing, the model hits 84.7 percent accuracy, beating Gemini, Claude, and GPT at roughly one-fourteenth of the cost. The numbers haven't been verified by anyone outside the two companies, though. The article GPT and Claude failed Bridgewater's finance tests because the right answers were never public https://the-decoder.com/gpt-and-claude-failed-bridgewaters-finance-tests-because-the-right-answers-were-never-public/ appeared first on The Decoder https://the-decoder.com .