DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds DeepReinforce released Ornith-1.0, an open-source family of coding models that learn their own reinforcement learning scaffolds, achieving state-of-the-art results among open models. The lineup includes four sizes from 9B to 397B parameters, built on Gemma 4 and Qwen 3.5, and is available under the MIT license. The 397B variant outperforms Claude Opus 4.7 on key benchmarks but trails Opus 4.8 and GLM-5.2-744B. DeepReinforce has released Ornith-1.0 , an open-source model family built for agentic coding. The lineup spans four sizes, from a 9B dense model to a 397B mixture-of-experts flagship. Every checkpoint ships under the MIT license on Hugging Face. The models are post-trained on top of pretrained Gemma 4 and Qwen 3.5. Most coding agents pair a model with a fixed, human-designed harness. Ornith-1.0 instead learns to write its own. The DeepReinforce research team reports state-of-the-art results among open models of comparable size. TL;DR - Ornith-1.0 ships in 9B, 31B, 35B-MoE, and 397B-MoE sizes under MIT, built on Gemma 4 and Qwen 3.5. - The model learns its own scaffold during RL, jointly optimizing the harness and the solution. - Ornith-1.0-397B tops Claude Opus 4.7 on both headline benchmarks, but not Opus 4.8 or the larger GLM-5.2-744B. - Three layers — fixed trust boundary, deterministic monitor, frozen LLM judge — guard against reward hacking. What is Ornith-1.0? Ornith-1.0 is a set of reasoning models tuned for coding agents. The variants are 9B Dense, 31B Dense, 35B MoE, and 397B MoE. The 35B model is mixture-of-experts and activates roughly 3B parameters per token. FP8 and GGUF builds are also published for faster local serving. Each model is a reasoning model. Replies open with a