Spanlens: Open-source LLM observability with one-line setup

Spanlens transforms how teams track and optimize large language model usage with minimal setup. The open-source platform (MIT license) captures every call your application makes to OpenAI, Anthropic, Gemini, Mistral, OpenRouter, Azure OpenAI, or a local Ollama model. Integration takes just one line of code: replace your client’s baseURL with Spanlens’ proxy, or run the CLI setup to let the tool rewrite your code automatically.
Built-in analytics beyond raw logs
Once connected, Spanlens records model details, token counts, latency, cost, and the full prompt and response—including streaming responses reconstructed in real time. The dashboard converts this data into actionable insights, such as per-request, per-model, and per-user cost breakdowns with cache token parsing for accurate savings. Agent tracing maps multi-step workflows as Gantt waterfalls and dependency graphs, pinpointing bottlenecks in complex chains. Anomaly detection flags outliers in latency, cost, or error rates using a rolling 7-day baseline, while alerts for budget thresholds, error spikes, or p95 latency delays can be routed to Email, Slack, or Discord.
Security and optimization at the proxy
Spanlens doesn’t just monitor—it protects. A regex-based scanner inspects requests and responses for PII leaks or prompt injections, with the option to block malicious payloads at the proxy. A built-in savings engine identifies calls that could run on cheaper models (e.g., a gpt-4o classification task) and estimates potential monthly savings from switching. For prompt refinement, versioning with A/B experiments compares latency, cost, and accuracy using statistical tests, while an LLM-as-judge evaluation scores outputs against rubric anchors. Datasets enable offline evaluations and regression checks, ensuring continuous improvement without manual overhead.
Source: DEV Community. AI-assisted editorial synthesis — TechnoExpress.

