Artificial intelligenceJune 26, 2026· via MarkTechPost

Ornith-1.0 Lets Coding Models Build Their Own Scaffolding

Ornith-1.0 Lets Coding Models Build Their Own Scaffolding

Image : MarkTechPost

DeepReinforce has unveiled Ornith-1.0, an open-source family of reasoning models designed to let coding agents craft their own reinforcement-learning scaffolds rather than rely on fixed human designs. The lineup spans four sizes—from a 9-billion-parameter dense model to a 397-billion-parameter mixture-of-experts flagship—all released under the permissive MIT license on Hugging Face. Built on top of pretrained Gemma 4 and Qwen 3.5 checkpoints, the models are post-trained to jointly optimize both the agent harness and the solution, a shift the team says delivers state-of-the-art results among comparably sized open models.

How the model learns its own scaffolding

Most coding agents pair a language model with a rigid, hand-crafted harness that guides problem-solving steps. Ornith-1.0 flips that paradigm: instead of following a predefined script, the model generates and refines its own scaffold during reinforcement learning. This includes memory management, error-handling logic, and orchestration of tool calls, all tuned to maximize performance on coding tasks. The approach enables the agent to adapt its internal workflows dynamically rather than being constrained by static human designs.

Deployment and tooling support

The release includes FP8 and GGUF builds for faster local inference, alongside serving recipes for vLLM, SGLang, and Transformers. Each model exposes an OpenAI-compatible endpoint, so standard agent frameworks can integrate without code changes. The 9-billion-parameter variant, for example, fits on a single 80 GB GPU in bf16 format, making it accessible for smaller-scale deployments. Trace outputs are returned in a structured reasoning_content field, and tool calls are emitted in well-formed JSON, simplifying agent loop integration.

Safety and benchmark performance

To curb reward hacking, DeepReinforce layers three defenses: a fixed trust boundary that keeps the environment and tool surface immutable, a deterministic monitor that audits the agent’s actions, and a frozen LLM judge that evaluates outputs. According to the research team’s reported scores, the 397-billion-parameter Ornith-1.0 leads among open models of comparable size but trails the latest proprietary releases.


Source: MarkTechPost. AI-assisted editorial synthesis — TechnoExpress.

Read the original source on MarkTechPost →

← Back to home