From Qwen3 to agents: a shift in AI model thinking

In a quiet but significant pivot, Junyang Lin—former technical lead of Alibaba’s Qwen project—has stepped away from the helm to focus on a new direction: agents. In a recent talk and follow-up post, he frames the evolution as a move from “training models → training agents,” signaling a broader shift in how AI systems are designed to reason, act, and adapt.

The Qwen3 era: hybrid thinking under the microscope

Lin’s presentation walks through the Qwen family, from the compact QwQ-32B to the massive Qwen2.5-Omni, with Qwen3 as the star. The model introduced “hybrid thinking”: a thinking mode for step-by-step reasoning and a non-thinking mode for quick responses. It also allowed users to cap reasoning effort via dynamic thinking budgets. Qwen3 expanded multilingual reach to 119 languages and offered models ranging from 0.6B to 235B parameters, including quantized formats under Apache 2.0. Two live demos—a web development assistant and a deep research agent—showcased its versatility.

Why hybrid thinking didn’t stick—and where it’s going

Lin argues that merging thinking and instruct modes damaged both. Strong instruct models thrive on brevity and speed; strong thinking models need depth and token spend. Forcing them together diluted their strengths. After a four-stage post-training pipeline failed to balance them, Qwen later split into separate Instruct and Thinking variants. Lin calls this a data problem more than a model problem, pointing out that Anthropic took a different path with Claude 3.7 Sonnet and 4, using user-set budgets and interleaved reasoning.

From models to agents: the next frontier

The talk closes with a clear direction: future work means training agents. Lin highlights pretraining with richer feedback, reinforcement learning from environment interaction, longer context windows, and expanded modalities. The architecture tables reveal a strategic evolution: small dense models tie embeddings and use 32K context, while larger ones drop tying and extend to 128K, with Mixture-of-Experts models activating just 8 of 128 experts per token. It’s not just about reasoning—it’s about acting.

Source: MarkTechPost. AI-assisted editorial synthesis — TechnoExpress.

From Qwen3 to agents: a shift in AI model thinking

The Qwen3 era: hybrid thinking under the microscope

Why hybrid thinking didn’t stick—and where it’s going

From models to agents: the next frontier

Essential tech, every morning