cd /news/artificial-intelligence/ornith-1-0-beats-claude-at-coding-ru… · home topics artificial-intelligence article
[ARTICLE · art-44202] src=byteiota.com ↗ pub= topic=artificial-intelligence verified=true sentiment=↑ positive

Ornith 1.0 Beats Claude at Coding — Runs on One GPU

DeepReinforce AI released Ornith 1.0, an open-source coding model family that scores 82.4 on SWE-Bench Verified, outperforming Claude Opus 4.7's 80.8, and runs the 35B variant locally on a single RTX 4090. The model uses self-scaffolding reinforcement learning to generate its own task-specific orchestration plans, a first for open-source coding agents. All models are MIT-licensed with no regional restrictions, available on Hugging Face.

read4 min views1 publishedJun 30, 2026
Ornith 1.0 Beats Claude at Coding — Runs on One GPU
Image: Byteiota (auto-discovered)

On June 25, 2026, DeepReinforce AI released Ornith 1.0 — a family of MIT-licensed open-source coding models that scores 82.4 on SWE-Bench Verified, edging out Claude Opus 4.7’s 80.8, and runs the practical 35B variant locally on a single RTX 4090. The headline is attention-grabbing enough. The actual story is more interesting: Ornith 1.0 coding AI doesn’t use a human-engineered agentic harness. It learns to write its own.

What Ornith 1.0’s Self-Scaffolding RL Actually Means #

Every coding agent on the market — Cursor, GitHub Copilot, OpenHands, all of them — pairs a model with a fixed harness: a hand-built orchestration layer that handles tool calls, memory management, error recovery, and task planning. Engineers design this harness. It stays static. The model reasons within it.

Ornith’s training approach, which DeepReinforce calls self-scaffolding reinforcement learning, changes the fundamental assumption. During RL post-training, the model doesn’t just learn to solve coding tasks — it learns to propose its own scaffold for each task before generating a solution. Each training step runs in two stages: the model reads the task and drafts a per-task orchestration plan; then it generates a solution using that plan, with rewards flowing back to both stages. The scaffold co-evolves with the model’s policy throughout training. According to MarkTechPost’s coverage, this is the first open-source model to treat the scaffold as a learnable object rather than a fixed human design.

Three defense layers prevent the obvious reward-hacking failure modes: an immutable outer trust boundary isolates the execution environment, a deterministic monitor flags banned actions like reading hidden test paths or editing verification scripts, and a frozen LLM judge acts as a verification veto. It’s a reasonable engineering response to the alignment risks of letting a model control its own training dynamics.

Ornith 1.0 Benchmark Results — and the Asterisk #

The benchmark results are real. Ornith 1.0-397B scores 77.5 on Terminal-Bench 2.1 versus Claude Opus 4.7’s 70.3, and 82.4 on SWE-Bench Verified versus Claude’s 80.8. More practically useful: the 35B MoE model scores 64.2 on Terminal-Bench 2.1, beating Qwen 3.5-397B (53.5) — a model eleven times larger. Those are not manufactured numbers.

However, the asterisk on any SWE-Bench score matters. Independent research published in March 2026 found that roughly 19.78% of patches labeled as resolved by top leaderboard agents are semantically incorrect when tested against stronger test suites. Separately, solution leakage — where the expected fix is described in the issue report itself — exists in more than 32% of benchmark instances. These are problems with the benchmark ecosystem, not claims specific to Ornith. But they mean “beats Claude on SWE-Bench” deserves measured enthusiasm rather than uncritical acceptance.

Simon Willison ran the 35B GGUF variant locally through LM Studio and found it performed “quite well” on real code analysis tasks at 103 tokens per second. That’s a more meaningful signal than the benchmark table — and it holds up.

Related:[Kimi K2.7-Code: Open-Source 1T Coding Agent, 30% Fewer Thinking Tokens]

MIT License, Local Deployment, No Strings #

All four Ornith 1.0 models — 9B Dense, 31B Dense, 35B MoE, and 397B MoE — are available on Hugging Face under MIT license with no regional restrictions. That last clause matters: several recent open-source releases have included geographic limitations on commercial use. Ornith has none.

For practical local deployment, the 35B MoE is the target variant. In Q5_K_M quantization it fits in about 25GB — a single RTX 4090 or a 24GB Mac. The 9B Dense drops to roughly 6GB at Q4 quantization, making it accessible on most consumer GPUs. Both serve through vLLM, SGLang, llama.cpp, and Ollama with OpenAI-compatible APIs and 256K context windows. The GitHub repository (524 stars, 50 forks after five days) integrates with OpenHands, Hermes, and OpenClaw for full agentic deployments. The community reaction on Hacker News (144 points, trending on June 30) is characteristically split. Some developers report solid results on real codebases; others note the model performs poorly in chat mode without tools — which is expected, since Ornith was never designed for conversation. It’s an agentic coding model. Using it as a chat assistant is the wrong deployment pattern.

Key Takeaways #

  • Ornith 1.0’s self-scaffolding RL is a genuine technical innovation — the first open model to learn agentic orchestration logic rather than use a fixed human-designed harness
  • SWE-Bench benchmark scores are real but the benchmark itself is inflated across the leaderboard — independent verification on real repositories is the signal to wait for
  • The 35B MoE variant runs on a single RTX 4090, making frontier-adjacent coding performance locally deployable without API costs or data leaving your environment
  • MIT license with no regional restrictions puts Ornith 1.0 ahead of several competing open-source releases on accessibility
  • Deploy in agentic frameworks (OpenHands, Hermes) with tools enabled — chat-mode performance without tools is not the intended use case
── more in #artificial-intelligence 4 stories · sorted by recency
── more on @deepreinforce ai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/ornith-1-0-beats-cla…] indexed:0 read:4min 2026-06-30 ·