Dear NYTimes: Vibe coding is not a metric

The New York Times published an article claiming Anthropic's Claude Opus 4.8 outperforms all other publicly available technologies on "vibe coding," a subjective term for AI generating code from conversational prompts. Critics argue "vibe coding" is not a measurable metric, making the comparison meaningless and the reporting lazy. The article's benchmark compared Opus 4.8 only to previous Anthropic models, failing to provide objective proof of superiority.

May. 29, 2026 Dear NYTimes: Vibe coding is not a metric Web Development Technology Update: The digital edition of the New York Times article 1 in question has been updated to include an actual benchmark for the Opus 4.8 model with regard to vibe coding. That's an improvement. However, the cited benchmark compares Opus 4.8 only to the previous Anthropic model. The claim that it outperforms "all other publicly available technologies" at this subjective task remains bogus. Anthropic's new Claude Opus 4.8 outperforms all other publicly available technologies on vibe coding, which is when A.I. technologies create software in response to prompts written in conversational English. - Anthropic Tops OpenAI to Become the World’s Most Valuable A.I. Start-Up. The New York Times 2 I facepalmed after reading that sentence in a recent article from the Business section of the New York Times. "Vibe coding" is not a metric, as in you cannot measure it. A.I. models can generate computer code, and code quality is subjective. More isn't necessarily better. Less isn't necessarily better. A quick example comes to mind: a programmer might intentionally choose to make a section of code "less efficient" in order to make it easier to read. Is this better or worse? Saying that Opus 4.8 outperforms other models at vibe coding is lazy reporting. It'd be like saying that Argentina is better at "sporting" than France . 3 f3 Which sport? At what age range? During what years? A.I. companies are currently raising more money than God. A real metric. They are also consuming significant finite resources in the form of energy and water. As investors and the public weigh these costs, journalists should not be handing these companies freebie metrics that sound impressive but mean nothing. If a product is better, or indeed good at all, they should have to prove it. References: - 1 NYTimes story: M. Isaac and C. Metz 2026, May 28 . Anthropic Tops OpenAI to Become the World’s Most Valuable A.I. Start-Up https://www.nytimes.com/2026/05/28/technology/anthropic-tops-openai-valuation.html - 2 Print edition https://www.coreyguitar.com/media/blog-images/2026-05-29-print edition.jpg of the above article. - 3 Argentina has won 3 FIFA World Cups: in 1978, 1986 and 2022. France has won 2 FIFA World Cups: in 1998 and 2018.