Effort knobs in AI: When less work is more

The five effort knobs—low, medium, high, xhigh, max—do more than tweak tone; they steer how an AI spends its time and tokens. Benchmarks on real tasks show the knob’s sweet spots aren’t where you might expect, and one surprise flips the usual cost curve.

The setup behind the surprise

For three typical workloads—classification, code generation, and a multi-step contract audit—the same model was run at each effort level three times. Quality was scored against a known answer or manual review, while tokens and latency were measured. The results reveal that effort doesn’t move in a straight line from cheap to expensive once feedback loops enter the picture.

Classification: high effort buys nothing

Quality stayed flat across all levels for the contract labeling task. The right label was found at low just as reliably as at max, but token usage jumped roughly eightfold. Latency followed tokens. The takeaway is simple: when the task is scoped and unambiguous, dialing effort down to low keeps costs low without risking quality.

Code generation: high hits the plateau

Edge cases in TypeScript code were missed at low, caught by high, and not improved further by higher levels. Tokens and latency climbed steadily from low to high, then plateaued at xhigh and max. The practical ceiling is high: it catches the edge cases without the token bloat of xhigh or max.

Multi-step audits: xhigh cuts total cost

This is where the data defied expectations. At medium, the model explored less per step, took more turns, hit dead ends, and re-derived things, driving total tokens higher. At xhigh, better up-front planning delivered a shorter path to the answer, lowering total tokens and improving quality. For agentic loops, xhigh isn’t just better—it’s often cheaper.

Source: DEV Community. AI-assisted editorial synthesis — TechnoExpress.