Artificial intelligenceJuly 4, 2026· via MarkTechPost

Mistral’s Leanstral 1.5 Tackles Math Proofs with 87% Accuracy

Mistral’s Leanstral 1.5 Tackles Math Proofs with 87% Accuracy

Mistral AI has just dropped Leanstral 1.5, an open-weight code agent model specialized for Lean 4 that tackles automated theorem proving. The new release, available under the Apache 2.0 license and accessible via a free API endpoint, updates the earlier Leanstral-2603 model and belongs to the Mistral Small 4 family. It pushes the envelope with a mixture-of-experts architecture that keeps compute lean while maintaining high capacity.

A New Approach to Proof Engineering

Leanstral 1.5 is designed as a code agent for Lean 4, a formal proof assistant capable of expressing advanced mathematical objects like perfectoid spaces and Rust fragment properties. Its architecture leverages a mixture-of-experts model with 128 experts, activating just four per token, totaling 119 billion parameters with only 6.5 billion active at any time. The model handles a 256k-token context window and accepts both text and image inputs, though it outputs text only.

Three-Stage Training, Two Reinforcement Environments

Training follows three stages: mid-training, supervised fine-tuning, and reinforcement learning with CISPO. The model’s agentic behavior is shaped by two reinforcement-learning environments. In the multiturn environment, Leanstral receives a theorem, submits a proof, reads compiler feedback, and refines its attempt until success or budget exhaustion. In the code agent environment, it operates within a raw filesystem, editing files, running bash commands, and using the Lean language server to access real-time goals, errors, and type information—enabling it to complete partial proofs and persist across context compaction.

Benchmarks Show Strong Gains Across the Board

Mistral reports that Leanstral 1.5 saturates miniF2F with 100% validation and test scores, solves 587 of 672 PutnamBench problems, and sets new state-of-the-art results on FATE-H (87%) and FATE-X (34%). On FLTEval, pass@1 improves from 21.9 to 28.9 and pass@8 jumps from 31.9 to 43.2, surpassing Opus 4.6’s 39.6 at roughly one-seventh the cost. Mistral estimates competitors like Seed-Prover 1.5 can cost up to $300 per problem at high settings, while Leanstral 1.5 operates at about $4. The model’s test-time scaling shines: increasing token budgets per attempt yields steady gains, with 493 problems solved at 1M tokens and 587 at 4M.


Source: MarkTechPost. AI-assisted editorial synthesis — TechnoExpress.

Read the original source on MarkTechPost →

← Back to home