Cohere’s ‘North Mini Code’ brings efficient AI coding to your own hardware

Cohere AI this week unveiled North Mini Code, its first developer-facing coding model designed to run efficiently on a single H100 GPU. The open-weight mixture-of-experts (MoE) model packs 30 billion total parameters but activates just three billion during each forward pass, striking a balance between capability and resource demands.

A model built for agentic workflows

North Mini Code targets three core tasks: code generation, agentic software engineering, and terminal operations. Unlike broader multimodal models, it processes text only, with a 256K-token context window and a maximum output of 64K tokens. Cohere optimized the architecture for autonomous workflows that can call tools, reason step-by-step, and orchestrate sub-agents—key features for modern AI-driven development pipelines.

Lean design, high throughput

The model uses a decoder-only Transformer with sparse MoE layers, interleaving sliding-window and global attention in a 3:1 ratio. Its feed-forward block contains 128 experts, with eight activated per token using a sigmoid-gated router. This design keeps active compute low while preserving broad capacity. In internal benchmarks, North Mini Code delivered up to 2.8 times higher output throughput than a comparable model on identical hardware, alongside a 30% reduction in inter-token latency.

Weights are released under Apache 2.0 on Hugging Face, with additional access via the Cohere API, Model Vault, and OpenRouter. The minimum hardware requirement is a single H100 GPU operating at FP8 precision, making it feasible for teams to self-host without large clusters. Cohere positions the release within its “sovereign AI” initiative, emphasizing control and autonomy for developers who prefer running models on their own terms.

Source: MarkTechPost. AI-assisted editorial synthesis — TechnoExpress.

Cohere’s ‘North Mini Code’ brings efficient AI coding to your own hardware

A model built for agentic workflows

Lean design, high throughput

Essential tech, every morning