Product analytics for AI agents. Every run is captured end to end: what the user wanted, the steps your agent took, and how it ended. Fix the biggest problem, build what users keep asking for, and your agent gets a little better every week.
Built for teams running a customer-facing agent in production, support and operations first, where a failed run is a real ticket or lost revenue.
One line to install. About 1 ms to capture a call, flushed in the background, so your model response returns before anything is sent. Zero added latency on the request path. Prefer a walkthrough? Book a demo.
Observability tells you the model ran. Intencion tells you whether the user got what they came for, and what to build so more of them do.
Spans, tokens, and latency tell you what the system did. They are not the question that sets your roadmap. Intencion maps the intent behind every run to the path your agent took and how it ended, then ranks what to fix and what to build next.
It sits next to your tracing (Langfuse, LangSmith, Datadog) and your product analytics (Amplitude, PostHog). Keep your stack and add one line. We are not a span format and do not ask you to re-emit traces. An OpenTelemetry bridge for teams that already emit spans is on the roadmap.
Tokens, latency, and traces, one span at a time.
Every user goal, grouped, with how many runs actually resolved.
The asks your agent cannot handle yet, ranked by how often they come up.
The taxonomy compounds. Every run sharpens your intent clusters, so the longer it runs the better it labels your traffic. Your intent map is built from your own runs, which is hard to copy.
Add the SDK once. After that you can see what users want, where your agent succeeds, and where it falls short. Each week you fix the biggest problem and ship what people keep asking for, so the agent keeps getting better. That is the loop.
No model guesses whether your agent worked. A run that returns is a success. A run that throws is a failure, and the error message becomes the reason. Need nuance? Set it yourself. It is deterministic and the same on every replay, so there is no classifier to babysit and no accuracy rate to chase.
Illustrative example. "Change my plan" resolves 61 percent, and "can't verify identity" is 43 percent of its failures, all on the path verify_user → lookup_account → escalate. Fix that one step and the intent climbs toward the 90s. Do that every week, top problem and top request, and it adds up: high 70s into the 90s in a month.
Patch your model client once and every call is captured: model, tokens, latency, outcome. It patches at the class level, so calls your framework makes internally (LangChain, the OpenAI Agents SDK, LlamaIndex) are captured too. When your agent calls tools, wrap the run and record each step. No decorators sprinkled everywhere. Published for TypeScript and Python.
Each row is a real part of the product, built to answer one specific question about your agent.
Every goal your users bring to the agent, grouped and counted, with how many runs resolved. One row per intent, not twenty log lines. You declare the intent, or we infer it: match to your existing clusters first, then a small model names anything new and reuses your labels. Editing the taxonomy from the dashboard is on the roadmap.
Open any run to see the goal, every tool call it made, the path it took, and how it ended, all on one timeline.
Failed and abandoned runs grouped by cause and ranked by how often they happen, so you fix the biggest one first.
The things users keep asking for that your agent can't handle yet. Your roadmap, straight from real usage.
Emails, names, and card numbers are stripped before anything is stored, so nothing sensitive lands in our database.
Start free in a minute, no call required. Or click through the live demo first.
Grab 20 minutes. We'll walk through Intencion on a real agent's runs and answer anything. Pick a time below.