Cisco’s AI Tool FAPO Automates Prompt Tuning for Reliable LLM Pipelines
Getting prompts right remains the biggest hurdle in deploying reliable large language model applications. A minor wording tweak can swing accuracy by up to 20 percent, and what works on a handful of examples can collapse at scale. Cisco AI is tackling this with FAPO, a fully automated prompt optimization system driven by Claude Code agents. FAPO evaluates multi-step pipelines, identifies where failures occur, proposes improved variants, and validates them—all in a closed loop until target accuracy is reached. The project is released under Apache 2.0 and also supports Codex as an optimization agent.
Behind the Optimization Loop
FAPO operates as a multi-tenant framework where each project is isolated in its own directory, holding prompts, datasets, chain definitions, scorers, and configurations. The core engine, named Hephaestus, handles evaluation, chain execution, and scoring across supported providers like OpenAI, Baseten, and SageMaker. Users only need to supply a dataset of paired inputs and expected outputs, which FAPO splits into a validation set for iteration and a held-out test set for final evaluation. From a task description, Claude Code can automatically scaffold the initial prompt, chain, and scorer.
How FAPO Improves Accuracy Step by Step
The optimization process follows a six-stage loop: evaluate performance on the dataset, attribute failures to root causes using heuristics and LLM analysis, propose targeted variants, review proposals for scope and data leakage, compare against the previous best, and iterate until the target is met or the budget is exhausted. FAPO escalates through three levels—prompt edits first, then parameter adjustments, and finally structural changes to the pipeline—guided by step-level failure attribution.
In Cisco’s evaluation, FAPO outperformed GEPA, a leading prompt optimizer, on 15 of 18 model-benchmark comparisons, with a mean gain of 14.1 percentage points. On benchmarks requiring pipeline changes, it won all six comparisons with a mean gain of 33.8 percentage points, while GEPA’s only win on AIME fell within sampling noise. To prevent overfitting, FAPO enforces strict guardrails such as training-split-only inspection and immutable variant files.
Source: MarkTechPost. AI-assisted editorial synthesis — TechnoExpress.

