cd /news/large-language-models/grok-4-3-edges-gpt-5-4-mini-on-execu… · home topics large-language-models article
[ARTICLE · art-24172] src=runtimewire.com pub= topic=large-language-models verified=true sentiment=· neutral

grok-4.3 edges gpt-5.4-mini on execution

Grok 4.3 outperformed GPT 5.4 Mini in a head-to-head execution benchmark, scoring 38.3 to 36.2 by demonstrating greater reliability on formatting, tone control, and frictionless output. In a key test converting messy orders to JSON, only Grok 4.3 returned valid JSON directly, while GPT 5.4 Mini wrapped its response in Markdown fences, violating the requirement. The margin, though narrow, signals a meaningful advantage in practical task execution.

read1 min publishedJun 11, 2026

This wasn’t a blowout, but the margin is real. grok 4.3 takes the aggregate, 38.3 to 36.2, because it was more reliable on the kinds of details that decide practical head to heads: exact formatting, tone control, and not adding avoidable friction. The cleanest example is messy orders to json , where both models parsed, normalized, and sorted correctly, but only grok 4.3 actually obeyed the requirement to return valid JSON directly. gpt 5.4 mini wrapped its answer in Markdown fences, which is ...

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/grok-4-3-edges-gpt-5…] indexed:0 read:1min 2026-06-11 ·