Skip to content

feat: Cloudflare-native AI tracing (agents/observability + agents/observability/ai)#1860

Draft
mattzcarey wants to merge 9 commits into
mainfrom
feat/agent-tracing
Draft

feat: Cloudflare-native AI tracing (agents/observability + agents/observability/ai)#1860
mattzcarey wants to merge 9 commits into
mainfrom
feat/agent-tracing

Conversation

@mattzcarey

@mattzcarey mattzcarey commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Cloudflare-native tracing for AI agents: spans built on the Workers runtime's tracing API (cloudflare:workers), named and attributed to the OpenTelemetry GenAI semantic conventions, flowing to Workers Observability with zero dependencies. No OTel SDK, no exporter, no collector — traces show up in the dash next to your fetch/DO/KV spans.

Instrumenting the AI SDK

Enable traces on the Worker:

// wrangler.jsonc
{
  "observability": { "traces": { "enabled": true } }
}

AI SDK v6 — wrap the namespace once, use it as normal:

import * as ai from "ai";
import { wrapAISDK } from "agents/observability/ai";

const { generateText, streamText } = wrapAISDK(ai);

await streamText({
  model,
  prompt: "book a table for two",
  tools: { searchRestaurants, reserve },
  experimental_telemetry: {
    functionId: "booking-agent", // becomes gen_ai.agent.name
    metadata: { conversationId: "conv-42" }
  }
});

Every call produces one semconv-shaped trace:

invoke_agent booking-agent            gen_ai.operation.name=invoke_agent, tokens, finish reason
├── chat gpt-4o                       per doGenerate/doStream: model, params, usage, time_to_first_chunk
├── execute_tool searchRestaurants    gen_ai.tool.name, gen_ai.tool.call.id, real duration
└── chat gpt-4o

AI SDK v7 — register the telemetry lifecycle adapter instead of wrapping:

import { registerTelemetry } from "ai";
import { createAISDKTelemetry } from "agents/observability/ai";

registerTelemetry(createAISDKTelemetry());

Same spans, driven by the SDK's telemetry callbacks, correlated by cloudflare.agents.call.id / gen_ai.tool.call.id.

Stream spans stay open until the stream is consumed, cancelled, errors, or is returned early — an aborted stream closes as canceled, not as a false success. Streaming tools (async-generator execute) keep their span open until the iterable is drained (bodies run inside the tool span's async context; early termination still runs the generator's own cleanup), so tool durations are real. Untraced invocations take a pristine fast path: the original operation gets the original params — no tool wrapping, no model middleware, no stream patching.

Think: traced out of the box

Think agents emit this exact trace tree per turn with zero configuration and zero new API surface. The turn's streamText call is the invoke_agent root span — named after the agent class, carrying agent/conversation identity plus turn attributes (cloudflare.agents.turn.request_id, .trigger, .admission, .channel, .continuation, .generation) — with inference and tool calls as its only children. No opt-in flag, no setup, nothing exported; on runtimes without the tracing API the tracer is a no-op.

invoke_agent SupportAgent              turn identity + request params + aggregated usage
├── chat gpt-4o                        per inference step
├── execute_tool lookupOrder           gen_ai.tool.call.id, real duration
└── chat gpt-4o

How it works internally: Think merges its identity and current-turn metadata into experimental_telemetry.metadata at the call site (caller-provided metadata wins, and still flows to the AI SDK's own telemetry when enabled), and the wrapper projects those onto root-span attributes — reserved keys to cloudflare.agents.turn.*, userId to semconv user.id, any other scalar to cloudflare.agents.metadata.{key}. Drain loops also finalize the underlying model stream on early exit (in-stream error, stall abort, user abort) so operation spans close instead of leaking — the SDK tees its base stream, and an abandoned tee branch would otherwise leave the span open forever.

Schema

Span names follow the semconv formula with a bare-operation fallback past 64 UTF-8 bytes. Query on gen_ai.operation.name, never the span name.

Span gen_ai.operation.name Carries
invoke_agent {agent} invoke_agent agent/conversation identity, request params, aggregated usage (incl. cache + reasoning tokens), finish reasons
chat {model} chat per-model-call params, usage, gen_ai.response.id/model, gen_ai.response.time_to_first_chunk
execute_tool {tool} execute_tool gen_ai.tool.name, gen_ai.tool.call.id, real execution duration
  • Semconv keys (gen_ai.*) wherever a home exists; vendor extensions under cloudflare.agents.* — never bare keys, never ai.* (that's the Vercel AI SDK's namespace; squatting it would fake compatibility we don't have).
  • Failures: otel.status_code: "ERROR" + error.type (the spec-defined status encoding for status-less backends). Cancellations: cloudflare.agents.canceled: true, status untouched — aborts are not errors.
  • Scalar-only, content-free: no prompts, messages, tool inputs/outputs, schemas, or raw error messages, ever. Semconv content capture is opt-in and stays permanently off here.

Public surface (kept deliberately small)

agents/observability: tracer + types AgentTracer / AgentSpan / TraceAttributes / TraceAttributeValue. agents/observability/ai: wrapAISDK, createAISDKTelemetry. Everything else — span builders, attribute constants, the SpanRuntime seam — is private, so when the runtime gains native OTel support we can converge behind the facade without a breaking change (names were chosen to avoid @opentelemetry/api collisions; our openSpan is not OTel's startSpan, which creates without activating).

Provenance

Ported from @msmps's feat/ai-tracing branch — commits carry Co-authored-by credit — then folded into the agents package and aligned with the GenAI semantic conventions.

Testing

  • 54 observability tests (tracer, v6 wrapper, v7 adapter): span names + fallback, abort-chunk cancellation, streaming-tool span lifetime, tool call ids, time-to-first-chunk, metadata→attribute passthrough (reserved keys, user.id, scalar passthrough, object dropping, identity consumption)
  • agents workers project: 78 files / 1539 tests green
  • think: workers project 42 files / 894 tests green with instrumentation live, plus generated-entry/vite/cli/react projects green
  • npm run check green across all 114 projects

mattzcarey and others added 3 commits July 2, 2026 15:00
Co-authored-by: msmps <7691252+msmps@users.noreply.github.com>
Co-authored-by: msmps <7691252+msmps@users.noreply.github.com>
- match workspace devDependency versions (sherif)
- oxfmt formatting, remove unused type imports (oxlint)
- bundler moduleResolution with extensionless relative imports
- build with tsdown like sibling packages (cloudflare:workers kept external)
- explicit types field for TS 6 (no automatic @types inclusion)
- start at version 0.0.0 with an initial-release changeset
- update pnpm lockfile
@changeset-bot

changeset-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 2fcf743

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
agents Minor
@cloudflare/think Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new

pkg-pr-new Bot commented Jul 2, 2026

Copy link
Copy Markdown

Open in StackBlitz

agents

npm i https://pkg.pr.new/agents@1860

@cloudflare/ai-chat

npm i https://pkg.pr.new/@cloudflare/ai-chat@1860

@cloudflare/codemode

npm i https://pkg.pr.new/@cloudflare/codemode@1860

create-think

npm i https://pkg.pr.new/create-think@1860

hono-agents

npm i https://pkg.pr.new/hono-agents@1860

@cloudflare/shell

npm i https://pkg.pr.new/@cloudflare/shell@1860

@cloudflare/think

npm i https://pkg.pr.new/@cloudflare/think@1860

@cloudflare/voice

npm i https://pkg.pr.new/@cloudflare/voice@1860

@cloudflare/worker-bundler

npm i https://pkg.pr.new/@cloudflare/worker-bundler@1860

commit: 2fcf743

Move the ai-tracing package into the agents package: the tracer core
(createTracer, the cloudflare:workers-bound tracer, span types) is exported
from agents/observability and the AI SDK v6/v7 adapters from the new
agents/observability/ai entry. The cloudflare:workers 'tracing' export is
accessed via the module namespace with a no-op fallback so runtimes that
predate it degrade gracefully instead of failing at module-link time (the
observability module loads with the main agents entry). The hand-rolled
cloudflare:workers type shim is dropped in favor of @cloudflare/workers-types.
Tests run in the agents workers pool.

Co-authored-by: msmps <7691252+msmps@users.noreply.github.com>
@mattzcarey mattzcarey changed the title feat: add @cloudflare/ai-tracing — Cloudflare-native tracing for the AI SDK feat: Cloudflare-native AI tracing via agents/observability Jul 2, 2026
…face

Schema (per semconv research; nothing shipped, renames free):
- span names follow the semconv formula with a 64-byte bare-op fallback:
  'invoke_agent {agent}', 'chat {model}', 'execute_tool {tool}' — the stable
  query key is gen_ai.operation.name, never the span name
- vendor keys move to cloudflare.agents.* (ai.* is the Vercel AI SDK's
  de-facto namespace); ai.tool.call_id becomes semconv gen_ai.tool.call.id
- failures record otel.status_code: ERROR + error.type (the spec-defined
  status encoding for status-less backends) instead of a bare error boolean;
  cancellations record cloudflare.agents.canceled and are not errors
- gen_ai.provider.name normalized to the semconv enum; gen_ai.request.stream
  emitted only when true; gen_ai.response.time_to_first_chunk and response
  id/model captured on the stream path

Wrapper fixes surfaced by the trace-content audit:
- AI SDK v6 signals aborts as in-band {type:'abort'} chunks and never rejects
  with AbortError — recognize them so aborted streams close as canceled
  instead of false successes
- streaming tools (async-generator execute) keep their execute_tool span open
  until the iterable is consumed instead of finishing at ~0ms
- tool spans carry gen_ai.tool.call.id from the execute options

Public surface hardening (runtime will gain native OTel support later):
- types renamed to avoid @opentelemetry/api collisions: AgentTracer,
  AgentSpan, TraceAttributes, TraceAttributeValue; startSpan renamed openSpan
  (OTel's startSpan means create-without-activating — a semantic inversion)
- createTracer, SpanRuntime, SpanWriter, MaybePromise are private:
  SpanRuntime is the OTel-convergence seam and must stay free to change
@mattzcarey mattzcarey changed the title feat: Cloudflare-native AI tracing via agents/observability feat: Cloudflare-native AI tracing (agents/observability + agents/observability/ai) Jul 3, 2026
Zero new public surface. Think's streamText call routes through the
always-on agents/observability/ai wrapper, so every turn emits an
'invoke_agent {agent class}' root span with 'chat {model}' and
'execute_tool {tool}' children in Workers Observability.

- the admittedTurnContext ALS internally carries trigger/admission/channel/
  continuation/generation; _turnTelemetry() injects agent identity and turn
  metadata into experimental_telemetry.metadata (caller values win; inert
  for the AI SDK's own telemetry unless enabled)
- agents adapters (v6 + v7) project telemetry metadata onto root-span
  attributes: reserved keys -> cloudflare.agents.turn.*, userId -> user.id,
  other scalars -> cloudflare.agents.metadata.{key}, objects dropped
- drain loops finalize the underlying model stream on early exit (in-stream
  error break, stall abort, user abort) via a WeakMap finalizer calling
  consumeStream — the SDK tees its base stream, so an abandoned tee branch
  would otherwise leave the operation span open forever
- wrapModel skips middleware for gateway-style string model ids (the root
  span still carries the model)
Verified against the pinned ai@6.0.208 and fixed:
- stream observation now unwraps the SDK's {part} baseStream envelope —
  previously real spans missed usage, finish reasons, errors, and aborts
  (only look-alike test fixtures passed); added real-SDK integration tests
  (actual streamText + MockLanguageModelV3) covering envelope unwrapping,
  in-band error/abort parts, tool call ids, and time-to-first-chunk
- removed the eager result-getter 'safeguard': steps/totalUsage/finishReason
  getters call consumeStream(), so touching them started hidden stream
  consumption at wrap time; added a laziness regression test
- untraced fast path: when an invocation is not traced the wrapper calls the
  original operation with the original params — no tool wrapping, no model
  middleware, no stream patching (AgentSpan gains readonly isTraced)
- main agents entry no longer initializes tracing: diagnostics-channel events
  moved to observability/events.ts; the public barrel composes events+tracing
- provider doStream now runs inside the chat span's activation so provider
  work nests under it; stream patching fails open on unknown result shapes
- extractors read the public result shapes (inputTokenDetails/
  outputTokenDetails, response.modelId, deprecated flat fields) and string
  gateway model ids
- think: agents peer floor raised to >=0.18.0; the early-exit stream drain is
  idempotent (deleted before invocation) and rides ctx.waitUntil
- v7 tool spans keyed by callId:toolCallId (concurrent id reuse); operation
  wrappers cached for stable identity; tracer attribute writes fail-safe;
  cloudflare.agents.operation.id renamed to .operation.name (values are names)
- untraced calls no longer compute the span spec: roots open with only the
  semconv name (agent name via direct property reads) and empty attributes;
  the full spec — metadata enumeration, request fields, context allowlists —
  is computed after the isTraced check and written through an internal
  writeSpanAttributes seam, so caller getters/proxies are never enumerated
  on untraced calls
- think drains the model stream only on early exits (break or throw), via a
  natural-exhaustion flag — consumeStream is not a no-op (it tees baseStream
  and traverses the buffered branch), so draining every call was per-inference
  overhead; a thrown exit (stall watchdog) still drains
- the finalizer runs exactly once: the drain promise is created before
  ctx.waitUntil, so a missing/throwing waitUntil cannot start a second tee
  consumer
- async-generator tool bodies are re-entered into the tool span's async
  context via AsyncLocalStorage.snapshot() on every pull, so spans created
  inside the body parent under execute_tool (verified in workerd)
- extractors: provider response-metadata stream parts populate response
  id/model on chat spans; v7 reads public usage detail shapes
  (inputTokenDetails/outputTokenDetails + deprecated flat fields) and prefers
  the served response.modelId over the requested event.modelId
The round-2 manual iterator.next() loop dropped for-await's automatic
return() forwarding: a consumer breaking while the wrapper was suspended at
yield closed the span but never ran the tool generator's own finally blocks.
The wrapper now tracks exhaustion and, on early termination, forwards
iterator.return() inside the tool span's context before finishing the span.
Regression test: consumer breaks after the first yield; the tool generator's
cleanup runs (and a span opened in that cleanup parents under execute_tool).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant