The Case for End-to-End Observability: From UI to AI Agent to Invoice

Name: IAPM
Author: Immersive Fusion

Dan Kowalski - 2026-04-14

A user clicks "Place Order." What happens next?

The browser fires a request. The API gateway authenticates it, routes it to the order service. The order service validates inventory, calls the payment service, which talks to Stripe. A recommendation engine suggests related products using an LLM. A support chatbot, running as an AI agent, pulls context from a vector database and invokes tools to check order history before answering a question about delivery. Notifications fan out to email, SMS, and analytics. A shipping label gets generated. The user sees a confirmation.

That's one request. Fifteen services. AI agents and LLM calls that cost real money. And in most organizations today, no single trace captures the full journey with cost visible at every step, because observability tooling was selected team by team, not request by request.

Today's distributed systems

The Three Silos

Observability has fractured into three worlds that don't talk to each other.

Frontend monitoring lives in one tool. You see page load times, Core Web Vitals, JavaScript errors, and user sessions. But the trace stops at the network boundary. What happened on the server? Open another tool.

Backend APM lives in another. Datadog, New Relic, Dynatrace, Grafana: they show you service maps, trace waterfalls, database queries, and queue latencies. They're excellent at showing what happened between your API gateway and your database. But they have no idea what the user was doing in the browser. And when a span says "called AI service," the trace ends there. The LLM call is a black box.

LLM observability lives in a third. Langfuse, LangSmith, Helicone, Arize, Traceloop, Portkey, Galileo. They track prompt templates, token usage, model costs, agent tool calls, and response quality. They're purpose-built for AI workloads. Some, like Langfuse and Arize, support W3C Trace Context propagation, which in principle could bridge the gap. In practice, the receiving APM platform must understand gen_ai.* attributes semantically. When it doesn't, the AI spans appear as opaque HTTP calls: technically in the trace, invisible in meaning. These tools can tell you that a chat completion cost $0.03 and used 1,200 tokens. They can't tell you which user request triggered it, which microservice called it, or what happened downstream.

Three tools. Three partial views. And even where integrations exist between them, cost remains absent as a first-class dimension of the trace.

Traditional APM Wasn't Built for This

The explosion of AI and agentic workloads has exposed something uncomfortable: traditional APM tools were designed for a world that no longer exists.

They were built for request-response. A user makes a call, a service processes it, a database returns data, the response goes back. The trace is a clean waterfall: sequential, predictable, measurable in milliseconds.

AI agents don't work like that.

An agent receives a request and enters a planning loop. It reasons about what tools to call. It fans out to a vector database for context, calls an LLM to synthesize, evaluates the result, decides it's not good enough, and loops again. It might iterate three, four, five times before producing an answer. Each iteration calls a different model, consumes a different number of tokens, and costs a different amount. The trace shape is recursive, branching, and variable-length. Nothing like the neat waterfalls APM tools were designed to render.

Traditional tools handle this poorly. Some collapse agent loops into a single opaque span. You see "AI service: 4,200ms" with no visibility into what happened inside. Others choke on the trace depth, rendering 40-span agent traces as an unreadable wall of nested waterfalls. None of them understand the semantic meaning of an LLM span, that gen_ai.usage.output_tokens isn't just another attribute. It's money leaving your account.

The fidelity gap is worse than sluggishness. When your APM tool shows an AI agent call as one flat span, you've lost the ability to answer the questions that matter: Which iteration was expensive? Which tool call failed? Did the agent hit a rate limit and retry, doubling the cost? Was the embedding cached or regenerated? These details exist in the telemetry. The OTel GenAI Semantic Conventions define exactly how to capture them. But if your platform can't render them, they're invisible.

One trace. Every service. Every AI call. Every cost.

Meanwhile, the specialized LLM observability tools (Langfuse, LangSmith, Arize, and others) have the fidelity for AI workloads but none of the distributed tracing context. They can show you every iteration of an agent loop with token-level detail. But they can't tell you which user request triggered it, what the upstream service was doing, or how the downstream notification service responded. They see the AI in isolation, disconnected from the system it lives in.

Why This Matters Now

This isn't an academic problem. It's a budget problem.

AI calls are expensive. At OpenAI's published pricing (as of Q2 2026), GPT-5.4 charges $2.50 per million input tokens and $15 per million output tokens. A single chat completion consuming 500 input and 500 output tokens costs roughly $0.009. That's modest on its own. But an agent running a plan/act/reflect loop over 3-5 iterations, each consuming thousands of tokens, can accumulate $0.10 to $1.00 in inference cost on a single user request. Compare that to a PostgreSQL query that costs fractions of a cent in compute. Unlike a slow database query, a slow AI call doesn't just cost time. It costs tokens.

When those costs are invisible, when they live in a separate LLM observability tool disconnected from the rest of your traces, you can't answer basic questions:

Which user actions trigger the most expensive AI calls?
What's the per-request cost of our checkout flow now that we added AI-powered recommendations?
When the AI agent retries due to a rate limit, how much does that retry cost, and which upstream timeout caused it?
Is the RAG pipeline's vector search hitting cache, or is every request regenerating embeddings?
Did the agent loop 5 times because the prompt was bad, or because the vector search returned irrelevant context?

These aren't hypothetical questions. They're the questions your finance team will ask when the AI bill arrives. And your traditional APM tool, the one that shows the agent as a single 4-second span, can't answer any of them.

What a Complete Trace Should Look Like

Imagine a single trace that captures the full journey:

Browser → API Gateway → Order Service → Payment → Recommendations → AI Agent → LLM (GPT-5.4) → Vector DB → Tool Calls → Content Moderation → Notification Fan-out → Email Service → Analytics

Every span in one trace. Every service hop with latency. Every AI call with token counts: input tokens, output tokens, model used, finish reason. Every tool invocation the agent made. Every cost calculable from the attributes on the span.

This isn't a fantasy. The building blocks already exist.

OpenTelemetry gives us the standard for distributed traces across traditional services. The OTel GenAI Semantic Conventions (still maturing as of early 2026, with core attributes stabilizing) extend that standard to LLM calls, defining attributes like gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.request.model, and gen_ai.response.finish_reasons. Microsoft Semantic Kernel and Microsoft Agent Framework already emit these spans. In the Python ecosystem, LangChain and LlamaIndex support OTel export via OpenLLMetry and native integrations. Between Semantic Kernel for .NET and OpenLLMetry for Python, instrumentation is available across both major AI stacks.

The conventions are shipping. The instrumentation is arriving. Some platforms have started integrating LLM data into their trace views. What's still missing for most is rendering the full picture spatially: traditional distributed traces and AI agent traces in one view, with cost as a first-class dimension you can see in context.

The Cost Dimension

Cost isn't a nice-to-have annotation. It should be a core observability signal, right alongside latency, errors, and throughput.

Traditional APM cares about latency (how long did this take?) and errors (did it fail?). Those remain essential. But AI workloads add a third axis: cost (how much did this request cost to serve?).

Consider a checkout flow that completes in 200ms but triggers three LLM calls for product recommendations and support. At GPT-5.4 pricing, each call consuming roughly 2,000 input tokens and 500 output tokens costs about $0.01. Three calls: $0.03 per checkout. At 10,000 checkouts per day, that's $300/day in AI inference cost for a single feature. A checkout that completes in 400ms with no AI involvement has a fundamentally different operational profile. Latency alone can't tell you this. Error rate alone can't tell you this. You need token counts on the spans.

And you need them in the same trace as the rest of the request, not in a separate tool where some analyst has to manually correlate timestamps to figure out which AI call belongs to which checkout.

When AI costs are embedded in the trace as span attributes, you can:

Aggregate per-request AI cost across all services
Alert when a single request exceeds a cost threshold
Compare cost across deployment versions (did the new prompt template reduce token usage?)
Attribute AI spend to specific features, users, or tenants
Identify retry loops that multiply cost without delivering value

What We're Building

This is the direction we're taking with Immersive APM.

IAPM already renders distributed traces as a 3D force-directed graph: services as nodes in space, connections as visible pathways, data flowing through the topology. You see the shape of your system, not just rows in a table. This is spatial observability: un-flattening a system back into the dimensions it always had, instead of compressing it onto dashboards.

The AI layer is already part of this. When a trace includes LLM calls with OTel GenAI attributes, they appear in the same graph. Same trace, same view. The order service calling a recommendation engine, the engine invoking an AI agent, the agent making two LLM calls and three tool invocations, then the trace continuing downstream to notifications and shipping. Token counts, model information, and finish reasons are visible on every AI span as first-class attributes. We're actively building dedicated LLM-specific views and cost aggregation on top of this foundation.

For agentic workloads specifically, spatial rendering has a concrete advantage: the fan-out and reconvergence of an agent loop (one request spawning multiple parallel tool calls that reconverge at a synthesis step) appears as a visible cluster in 3D space, with edge thickness encoding latency and node color encoding error state. In a 2D waterfall, the same structure is an indented list of spans that requires mental reconstruction. In 3D, the shape of the agent's reasoning is directly visible.

Tessa, our AI Assistant, is designed to answer the cost questions above directly within the 3D environment. Ask "which checkout triggered the most expensive AI calls today?" and Tessa identifies the trace, highlights the relevant spans, and shows you the token breakdown. The goal: no more manual correlation between dashboards.

Try It Today

We built an open-source trace generator that demonstrates what end-to-end traces look like when they include AI workloads. It's a single Go binary (no Docker, no infrastructure) that simulates 28 services (20 traditional + 8 AI) producing 40 scenario flows, including 12 AI agentic scenarios with full OTel GenAI semantic conventions.

Every AI span carries token counts, model identifiers, and finish reasons. Every agent span includes tool call tracking. A checkout flow that hits a recommendation agent generates one unified trace spanning the entire journey.

go install github.com/ImmersiveFusion/if-opentelemetry-tracegen/cmd/tracegen@latest
tracegen -endpoint otlp.iapm.app:443 -headers "api-key=YOUR_KEY" -complexity light

Point it at IAPM to explore the traces in 3D, or at any OTLP backend (Jaeger, Tempo, Honeycomb, Datadog) to see them in your own tool. Use -complexity heavy for the full 28-service topology with AI scenarios, or -ai-only to focus on agentic workloads. The AI spans will be there either way.

For a zero-setup demo, visit demo.iapm.app. Our chaos simulator lets you inject failures into a live system and watch the traces propagate in real time.

The Trace Is the Truth

The industry is converging on this direction. Datadog's LLM Observability now tracks token usage alongside distributed traces with end-to-end visibility into AI agent workflows. Honeycomb's AI observability features have similarly begun integrating LLM data into their core platform. This validates the thesis that full-stack AI tracing is where the market is headed. What remains rare is surfacing that cost data spatially, inside the 3D topology of the system that generated it, where the shape of an agent's reasoning is directly visible rather than buried in a flat table. OpenTelemetry gave us a universal standard for distributed tracing. The GenAI semantic conventions extend it to AI workloads. The instrumentation libraries are shipping.

What's needed now is observability platforms that treat the full trace, from the user's click to the AI agent's token budget to the shipping label, as one continuous story. With cost as a dimension you can see, alert on, and optimize.

That's what end-to-end observability means. Not three tools with three partial views. One trace. The whole journey. Every cost visible.

Enter the World of Your Application®

Start Free. Immersive. AI-guided. Full-stack observability. Enter the World of Your Application®.

Dan Kowalski

Father, technology aficionado, gamer, Gridmaster

About Immersive Fusion

Immersive Fusion (immersivefusion.com) is pioneering the next generation of observability by merging spatial computing and AI to make complex systems intuitive, interactive, and intelligent. As the creators of IAPM, we deliver solutions that combine web, 3D/VR, and AI technologies, empowering teams to visualize and troubleshoot their applications in entirely new ways. This approach enables rapid root-cause analysis, reduces downtime, and drives higher productivity, transforming observability from static dashboards into an immersive, intelligent experience. Learn more about or join Immersive Fusion on LinkedIn, Mastodon, Bluesky, X, YouTube, Facebook, Instagram, GitHub, Twitch, Discord.

Press inquiries: press@immersivefusion.com.

Streamlined Setup

Simple integration

Cloud-native and open source friendly

Rapid Root Cause Analysis

Intuitive tooling

Find answers in a single glance. Know the health of your application

AI Powered

AI Assistant by your side

Unlock the power of AI for assistance and resolution

Intuitive Solutions

Conventional and Immersive

Expert tools for every user:
DevOps, SRE, Infra, Education

info@immersivefusion.com

Email

Chat right from the web site

Online chat

888-992-3429

Immersive Blogs

Publications about innovation and new functionality.