DeepReinforce releases Ornith-1.0 open-source coding models

wpnews.pro

cd /news/large-language-models/deepreinforce-releases-ornith-1-0-op… · home › topics › large-language-models › article

[ARTICLE · art-39448] src=testingcatalog.com ↗ pub=2026-06-25T14:27Z topic=large-language-models verified=true sentiment=↑ positive

DeepReinforce releases Ornith-1.0 open-source coding models

DeepReinforce open-sourced Ornith-1.0, a family of self-improving coding models ranging from 9B to 397B parameters, which learn to generate their own task-specific scaffolds during reinforcement learning. The models achieve state-of-the-art results on coding benchmarks, with the 397B variant matching Claude Opus 4.7 on SWE-Bench Verified, and include defenses against reward hacking. The release aims to democratize advanced agentic coding capabilities.

read2 min views1 publishedJun 25, 2026

DeepReinforce releases Ornith-1.0 open-source coding models — Image: Testingcatalog (auto-discovered)

DeepReinforce has open-sourced Ornith-1.0, a self-improving family of models built for agentic coding. The release spans the full range, from a compact 9B Dense version meant for edge deployment up to a 397B MoE model aimed at frontier-scale work, with 31B Dense and 35B MoE options in between. Each variant is trained on top of pretrained Gemma 4 and Qwen 3.5 foundations.

What sets Ornith-1.0 apart from most reinforcement learning setups is how it handles the scaffold. Rather than depending on human-designed harnesses to steer solution generation, the model learns to produce both the solution rollouts and the task-specific scaffolds that guide them. Each RL step runs in two stages. Conditioned on a task and the scaffold last used for it, the model proposes a refined scaffold, then generates a solution conditioned on that scaffold. Reward from the rollout flows back to both stages, so the model is trained to author the orchestration as well as the answer. Repeated across training, scaffolds get mutated and selected toward those that produce higher-reward trajectories, and per-task strategies surface on their own without hand-engineered harness design.

Letting a model write its own scaffold opens a path to reward hacking, where a scaffold satisfies the verifier without doing the task. DeepReinforce describes a three-layer defense:

A fixed outer trust boundary that keeps the environment and test isolation beyond the model's reach.
A deterministic monitor that flags attempts to read withheld paths or alter verification scripts.
A frozen LLM judge that vetoes the verifier when gaming happens inside the allowed tool surface.

On performance, DeepReinforce positions Ornith-1.0 as state of the art among open-source models of comparable size. The company reports the 397B flagship reaching 77.5 on Terminal-Bench 2.1 and 82.4 on SWE-Bench Verified, figures it says match Claude Opus 4.7 and top open peers such as MiniMax M3 and DeepSeek-V4-Pro. The 35B model is reported to clear similarly sized Qwen and Gemma builds, while the 9B version is said to hit 43.1 on Terminal-Bench 2.1 and 69.4 on SWE-Bench Verified and match far larger models like Gemma 4-31B, which puts capable coding within reach of resource-limited hardware.

Check the models out on HuggingFace!

Learn more DeepReinforce is the AI lab behind the release, a team that publishes reinforcement learning research in the open, including prior work such as CUDA-L1, and that shipped the IterX optimization loop for code agents. Ornith-1.0 carries that direction further by folding scaffold construction into the training process itself. The weights and a technical report are released on Hugging Face for teams that want to run or study the models directly.

source & further reading

testingcatalog.com — original article Google tests voice dictation and Magic Pointer on Gemini desktop Meta launches AI glasses with three new styles from $299 Anthropic launches Claude Tag on Team and Enterprise plans

~/api · this article 200

$curl api.wpnews.pro/v1/news/deepreinforce-releases-o…

Read original on testingcatalog.com → www.testingcatalog.com/deepreinforce-releases-or…

mentioned entities

DeepReinforce

Ornith-1.0

Gemma 4

Qwen 3.5

Claude Opus 4.7

MiniMax M3

DeepSeek-V4-Pro

Hugging Face

metadata

slugdeepreinforce-releases-ornith-1-0-open-source-coding-models

topic#large-language-models

secondary3 topics

sentimentpositive

canonicaltestingcatalog.com

navigation

← prevHumanizing Artificial Intelligen…

next →MCP Server CORS: The Preflight P…

── more in #large-language-models 4 stories · sorted by recency

kdnuggets.com · 25 Jun · #large-language-models

5 Open Source Omni AI Models That Handle Text, Images, Audio, and Video

dev.to · 25 Jun · #large-language-models

AI Won’t Just Build the Next App. It Will Rebuild the Old Ones.

pub.towardsai.net · 25 Jun · #large-language-models

Multi-Agent Memory Is Harder Than You Think

12gramsofcarbon.com · 25 Jun · #large-language-models

Agentics: I no longer spend time counting geese

── more on @deepreinforce 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 24 Jun · #ai-policy

An AI startup is suing the US government for taking away Anthropic's new model

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required