OpenAI rolls out GPT-5.6 family with tiered models and new reasoning modes
OpenAI has begun a limited preview of its next-generation model family, GPT-5.6, structured as three distinct tiers—Sol, Terra, and Luna—each designed for different workloads and budgets. Sol serves as the flagship, Terra targets everyday production tasks at lower cost, and Luna offers a fast, low-cost option for routine requests. Early access is restricted to a small group of trusted partners via the API and Codex, with broader rollouts planned for ChatGPT, Codex, and the API in the coming weeks.
A family built for choice
GPT-5.6 is the first release in which OpenAI uses both a generation number and durable capability names. Sol represents the highest intelligence tier, Terra matches GPT-5.5 performance while costing roughly half as much, and Luna delivers strong capability at OpenAI’s lowest price. Each tier can evolve independently, giving developers clearer trade-offs between intelligence, speed, and cost.
New reasoning levers for complex work
Two new reasoning controls accompany the launch. Max mode allocates more compute to deepen a single chain of reasoning, while ultra mode orchestrates multiple subagents to split and accelerate complex tasks. Both modes increase latency and cost in exchange for higher accuracy on long-horizon problems, such as multi-step coding workflows or large-scale genomics analysis.
Early benchmarks and hardware moves
In early evaluations, Sol’s ultra mode achieved 91.91% on Terminal-Bench 2.1, surpassing both GPT-5.5 and the latest Claude model on command-line planning tasks. On gene analysis, Sol outperformed GPT-5.5 while using fewer tokens, and on security testing it remained competitive using about one-third of the output tokens. OpenAI also plans to run Sol on Cerebras hardware, aiming for up to 750 tokens per second starting in July.
Pricing shifts and production fit
Input and output pricing remains unchanged for Sol at $5 and $30 per million tokens, while Terra drops to $2.50/$15 and Luna to $1/$6. Prompt caching now supports explicit breakpoints and a minimum 30-minute cache life, with cache writes priced at 1.25x the input rate and cache reads still discounted by 90%. The family is positioned for long-horizon coding, high-volume production, and latency-sensitive applications, with early partners already testing use cases from automated ticket summarization to multi-step CLI agents.
Source: MarkTechPost. AI-assisted editorial synthesis — TechnoExpress.

