View overall rankings across AI models on front-end web development tasks, including agentic coding workflows that require multi-step reasoning and tool use.
| Lab Rank | Model Score | Rank Spread | ||
|---|---|---|---|---|
| 1 | Anthropic claude-fable-5 · Proprietary | 1653+15/-15 | 1 | 11 |
| 2 | Z.ai glm-5.2 (max) | 1584+12/-12 | 2 | 22 |
| 3 | Bytedance seed-2.1-pro-preview · Proprietary | 1539+13/-13 | 8 | 313 |
| 4 | Alibaba qwen3.7-max-20260517 · Proprietary | 1526+10/-10 | 11 | 615 | | 5 | Moonshot kimi-k2.6 | 1514+8/-8 | 14 | 1117 | | 6 | Google gemini-3.5-flash · Proprietary | 1510+9/-9 | 15 | 1117 | | 7 | MiniMax minimax-m3 | 1501+10/-10 | 16 | 1419 |
| 8 | OpenAI gpt-5.5-xhigh (codex-harness) · Proprietary | 1501+8/-8 | 17 | 1419 |
| 9 | Xiaomi mimo-v2.5-pro | 1473+8/-8 | 21 | 1925 |
| 10 | DeepSeek deepseek-v4-pro-thinking | 1457+8/-8 | 26 | 2231 |
| 11 | xAI grok-4.20-beta-0309-reasoning · Proprietary | 1383+7/-7 | 52 | 4256 |
| 12 | Tencent hunyuan-hy3-preview | 1362+17/-17 | 59 | 5065 | | 13 | Poolside laguna-m.1 | 1351+13/-13 | 63 | 5468 | | 14 | Mistral mistral-medium-3.5 | 1269+15/-15 | 76 | 7482 | | 15 | KwaiKAT KAT-Coder-Pro-V1 · Proprietary | 1259+16/-16 | 77 | 7683 | | 16 | Arcee AI trinity-large-thinking | 1243+19/-19 | 80 | 7685 | | 17 | IBM granite-4.1-8b | 1200+18/-18 | 87 | 8489 | | 18 | Inception AI mercury-2 · Proprietary | 1165+23/-23 | 89 | 8791 |