{"slug": "vibench-aims-to-rank-ai-models-by-app-building-not-just-coding-tests", "title": "ViBench aims to rank AI models by app-building, not just coding tests", "summary": "Amjad Masad introduced ViBench, a new benchmark designed to evaluate AI models on their ability to build complete applications end to end, in a post on X on Wednesday. Masad argued the benchmark challenges standard coding test rankings, stating that according to ViBench, Opus 4.8 outperforms GPT 5.5 in app-building tasks.", "body_md": "Amjad Masad (@amasad) introduced ViBench, a benchmark he says is designed to measure how well AI models build apps end to end, in a post on X on Wednesday. https://x.com/amasad/status/2062226152790675805 Masad framed the benchmark as a challenge to the way coding models are usually ranked. \"Benchmarks place GPT 5.5 as the best model on SWE, but is it the best at making apps end to end?\" he wrote. His answer, based on ViBench, is no: Masad said Opus 4.8 \"continues to be the king of vibe coding...", "url": "https://wpnews.pro/news/vibench-aims-to-rank-ai-models-by-app-building-not-just-coding-tests", "canonical_source": "https://runtimewire.com/article/vibench-aims-to-rank-ai-models-by-app-building-not-just-coding-tests", "published_at": "2026-06-03 17:46:33+00:00", "updated_at": "2026-06-03 18:14:42.932756+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-research", "ai-tools", "ai-products"], "entities": ["Amjad Masad", "ViBench", "GPT 5.5", "Opus 4.8", "SWE"], "alternates": {"html": "https://wpnews.pro/news/vibench-aims-to-rank-ai-models-by-app-building-not-just-coding-tests", "markdown": "https://wpnews.pro/news/vibench-aims-to-rank-ai-models-by-app-building-not-just-coding-tests.md", "text": "https://wpnews.pro/news/vibench-aims-to-rank-ai-models-by-app-building-not-just-coding-tests.txt", "jsonld": "https://wpnews.pro/news/vibench-aims-to-rank-ai-models-by-app-building-not-just-coding-tests.jsonld"}}