ViBench aims to rank AI models by app-building, not just coding tests

wpnews.pro

cd /news/artificial-intelligence/vibench-aims-to-rank-ai-models-by-ap… · home › topics › artificial-intelligence › article

[ARTICLE · art-20685] src=runtimewire.com ↗ pub=2026-06-03T17:46Z topic=artificial-intelligence verified=true sentiment=· neutral

ViBench aims to rank AI models by app-building, not just coding tests

Amjad Masad introduced ViBench, a new benchmark designed to evaluate AI models on their ability to build complete applications end to end, in a post on X on Wednesday. Masad argued the benchmark challenges standard coding test rankings, stating that according to ViBench, Opus 4.8 outperforms GPT 5.5 in app-building tasks.

read1 min views15 publishedJun 3, 2026

Amjad Masad (@amasad) introduced ViBench, a benchmark he says is designed to measure how well AI models build apps end to end, in a post on X on Wednesday. https://x.com/amasad/status/2062226152790675805 Masad framed the benchmark as a challenge to the way coding models are usually ranked. "Benchmarks place GPT 5.5 as the best model on SWE, but is it the best at making apps end to end?" he wrote. His answer, based on ViBench, is no: Masad said Opus 4.8 "continues to be the king of vibe coding...

source & further reading

runtimewire.com — original article YC-backed Prescience claims its AI health plans can cut premiums 20% GMI Cloud reports $500 million in signed ARR as capacity trails demand Adaption launches Teams with shared compute and centralized billing for model development

~/api · this article 200

$curl api.wpnews.pro/v1/news/vibench-aims-to-rank-ai-…

Read original on runtimewire.com → runtimewire.com/article/vibench-aims-to-rank-ai-…

mentioned entities

Amjad Masad

ViBench

GPT 5.5

Opus 4.8

SWE

metadata

slugvibench-aims-to-rank-ai-models-by-app-building-not-just-coding-tests

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicalruntimewire.com

navigation

← prevA Functional Taxonomy of World M…

next →VR exercise platform Supernatura…

── more in #artificial-intelligence 4 stories · sorted by recency

pub.towardsai.net · 22 Jul · #artificial-intelligence

TAI #214: Kimi K3 Brings Open Weight Closer to the Frontier

dev.to · 22 Jul · #artificial-intelligence

Small Model SWE‑bench: What Happens When You Push Tiny Models Into Full Task Pipelines

lesswrong.com · 20 Jul · #artificial-intelligence

Fable is SOTA at CIFAR Speedrun (& specification gaming)

sourcefeed.dev · 22 Jul · #artificial-intelligence

Gemini's New Flash Models Compete on Cost Per Task

── more on @amjad masad 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #ai-tools

What's the Future of Clay?

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required