cd /news/artificial-intelligence/dear-nytimes-vibe-coding-is-not-a-me… · home topics artificial-intelligence article
[ARTICLE · art-18693] src=coreyguitar.com pub= topic=artificial-intelligence verified=true sentiment=↓ negative

Dear NYTimes: Vibe coding is not a metric

The New York Times published an article claiming Anthropic's Claude Opus 4.8 outperforms all other publicly available technologies on "vibe coding," a subjective term for AI generating code from conversational prompts. Critics argue "vibe coding" is not a measurable metric, making the comparison meaningless and the reporting lazy. The article's benchmark compared Opus 4.8 only to previous Anthropic models, failing to provide objective proof of superiority.

read2 min publishedMay 29, 2026

May. 29, 2026

Web Development Technology

Update: The digital edition of the New York Times article [1] in question has been updated to include an actual benchmark for the Opus 4.8 model with regard to vibe coding. That's an improvement. However, the cited benchmark compares Opus 4.8 only to the previous Anthropic model. The claim that it outperforms "all other publicly available technologies" at this subjective task remains bogus.

Anthropic's new Claude Opus 4.8 outperforms all other publicly available technologies on vibe coding, which is when A.I. technologies create software in response to prompts written in conversational English.

  • Anthropic Tops OpenAI to Become the World’s Most Valuable A.I. Start-Up.

The New York Times[[2]] I facepalmed after reading that sentence in a recent article from the Business section of the New York Times.

"Vibe coding" is not a metric, as in you cannot measure it. A.I. models can generate computer code, and code quality is subjective. More isn't necessarily better. Less isn't necessarily better.

A quick example comes to mind: a programmer might intentionally choose to make a section of code "less efficient" in order to make it easier to read. Is this better or worse?

Saying that Opus 4.8 outperforms other models at vibe coding is lazy reporting. It'd be like saying that Argentina is better at "sporting" than France.[3] Which sport? At what age range? During what years?

A.I. companies are currently raising more money than God. (A real metric.) They are also consuming significant finite resources in the form of energy and water.

As investors and the public weigh these costs, journalists should not be handing these companies freebie metrics that sound impressive but mean nothing. If a product is better, or indeed good at all, they should have to prove it.

References:

- [1]NYTimes story: M. Isaac and C. Metz (2026, May 28).
[Anthropic Tops OpenAI to Become the World’s Most Valuable A.I. Start-Up](https://www.nytimes.com/2026/05/28/technology/anthropic-tops-openai-valuation.html) - [2]

Print editionof the above article. - [3]Argentina has won 3 FIFA World Cups: in 1978, 1986 and 2022. France has won 2 FIFA World Cups: in 1998 and 2018.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/dear-nytimes-vibe-co…] indexed:0 read:2min 2026-05-29 ·