Two days ago a Tokyo lab shipped a model that scored 73.7 on SWE-Bench Pro. Opus 4.8 gets 69.2 on the same test. GPT-5.5 gets 58.6. Gemini… Continue reading on Towards AI »
source & further reading
pub.towardsai.net — original article
The 3B Model Going Toe to Toe with Opus 4.5 In Maths and Coding
Substrate-Bound Coupling in Human-LLM Interaction
LAI #131: A Tool Call Can Succeed and Still Be the Wrong Tool