Cohere’s ‘North Mini Code’ brings efficient AI coding to your own hardware
Cohere AI this week unveiled North Mini Code, its first developer-facing coding model designed to run efficiently on a single H100 GPU. The open-weight mixture-of-experts (MoE) model packs 30 billion total parameters but activates just three billion during each forward pass, striking a balance between capability and resource demands.
A model built for agentic workflows
North Mini Code targets three core tasks: code generation, agentic software engineering, and terminal operations. Unlike broader multimodal models, it processes text only, with a 256K-token context window and a maximum output of 64K tokens. Cohere optimized the architecture for autonomous workflows that can call tools, reason step-by-step, and orchestrate sub-agents—key features for modern AI-driven development pipelines.
Lean design, high throughput
The model uses a decoder-only Transformer with sparse MoE layers, interleaving sliding-window and global attention in a 3:1 ratio. Its feed-forward block contains 128 experts, with eight activated per token using a sigmoid-gated router. This design keeps active compute low while preserving broad capacity. In internal benchmarks, North Mini Code delivered up to 2.8 times higher output throughput than a comparable model on identical hardware, alongside a 30% reduction in inter-token latency.
Weights are released under Apache 2.0 on Hugging Face, with additional access via the Cohere API, Model Vault, and OpenRouter. The minimum hardware requirement is a single H100 GPU operating at FP8 precision, making it feasible for teams to self-host without large clusters. Cohere positions the release within its “sovereign AI” initiative, emphasizing control and autonomy for developers who prefer running models on their own terms.
Source: MarkTechPost. AI-assisted editorial synthesis — TechnoExpress.

