cd /news/artificial-intelligence/vibench-aims-to-rank-ai-models-by-ap… · home topics artificial-intelligence article
[ARTICLE · art-20685] src=runtimewire.com pub= topic=artificial-intelligence verified=true sentiment=· neutral

ViBench aims to rank AI models by app-building, not just coding tests

Amjad Masad introduced ViBench, a new benchmark designed to evaluate AI models on their ability to build complete applications end to end, in a post on X on Wednesday. Masad argued the benchmark challenges standard coding test rankings, stating that according to ViBench, Opus 4.8 outperforms GPT 5.5 in app-building tasks.

read1 min publishedJun 3, 2026

Amjad Masad (@amasad) introduced ViBench, a benchmark he says is designed to measure how well AI models build apps end to end, in a post on X on Wednesday. https://x.com/amasad/status/2062226152790675805 Masad framed the benchmark as a challenge to the way coding models are usually ranked. "Benchmarks place GPT 5.5 as the best model on SWE, but is it the best at making apps end to end?" he wrote. His answer, based on ViBench, is no: Masad said Opus 4.8 "continues to be the king of vibe coding...

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/vibench-aims-to-rank…] indexed:0 read:1min 2026-06-03 ·