feat: Cloudflare-native AI tracing (agents/observability + agents/observability/ai)#1860
Draft
mattzcarey wants to merge 9 commits into
Draft
feat: Cloudflare-native AI tracing (agents/observability + agents/observability/ai)#1860mattzcarey wants to merge 9 commits into
mattzcarey wants to merge 9 commits into
Conversation
Co-authored-by: msmps <7691252+msmps@users.noreply.github.com>
Co-authored-by: msmps <7691252+msmps@users.noreply.github.com>
- match workspace devDependency versions (sherif) - oxfmt formatting, remove unused type imports (oxlint) - bundler moduleResolution with extensionless relative imports - build with tsdown like sibling packages (cloudflare:workers kept external) - explicit types field for TS 6 (no automatic @types inclusion) - start at version 0.0.0 with an initial-release changeset - update pnpm lockfile
🦋 Changeset detectedLatest commit: 2fcf743 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
agents
@cloudflare/ai-chat
@cloudflare/codemode
create-think
hono-agents
@cloudflare/shell
@cloudflare/think
@cloudflare/voice
@cloudflare/worker-bundler
commit: |
Move the ai-tracing package into the agents package: the tracer core (createTracer, the cloudflare:workers-bound tracer, span types) is exported from agents/observability and the AI SDK v6/v7 adapters from the new agents/observability/ai entry. The cloudflare:workers 'tracing' export is accessed via the module namespace with a no-op fallback so runtimes that predate it degrade gracefully instead of failing at module-link time (the observability module loads with the main agents entry). The hand-rolled cloudflare:workers type shim is dropped in favor of @cloudflare/workers-types. Tests run in the agents workers pool. Co-authored-by: msmps <7691252+msmps@users.noreply.github.com>
…face
Schema (per semconv research; nothing shipped, renames free):
- span names follow the semconv formula with a 64-byte bare-op fallback:
'invoke_agent {agent}', 'chat {model}', 'execute_tool {tool}' — the stable
query key is gen_ai.operation.name, never the span name
- vendor keys move to cloudflare.agents.* (ai.* is the Vercel AI SDK's
de-facto namespace); ai.tool.call_id becomes semconv gen_ai.tool.call.id
- failures record otel.status_code: ERROR + error.type (the spec-defined
status encoding for status-less backends) instead of a bare error boolean;
cancellations record cloudflare.agents.canceled and are not errors
- gen_ai.provider.name normalized to the semconv enum; gen_ai.request.stream
emitted only when true; gen_ai.response.time_to_first_chunk and response
id/model captured on the stream path
Wrapper fixes surfaced by the trace-content audit:
- AI SDK v6 signals aborts as in-band {type:'abort'} chunks and never rejects
with AbortError — recognize them so aborted streams close as canceled
instead of false successes
- streaming tools (async-generator execute) keep their execute_tool span open
until the iterable is consumed instead of finishing at ~0ms
- tool spans carry gen_ai.tool.call.id from the execute options
Public surface hardening (runtime will gain native OTel support later):
- types renamed to avoid @opentelemetry/api collisions: AgentTracer,
AgentSpan, TraceAttributes, TraceAttributeValue; startSpan renamed openSpan
(OTel's startSpan means create-without-activating — a semantic inversion)
- createTracer, SpanRuntime, SpanWriter, MaybePromise are private:
SpanRuntime is the OTel-convergence seam and must stay free to change
Zero new public surface. Think's streamText call routes through the
always-on agents/observability/ai wrapper, so every turn emits an
'invoke_agent {agent class}' root span with 'chat {model}' and
'execute_tool {tool}' children in Workers Observability.
- the admittedTurnContext ALS internally carries trigger/admission/channel/
continuation/generation; _turnTelemetry() injects agent identity and turn
metadata into experimental_telemetry.metadata (caller values win; inert
for the AI SDK's own telemetry unless enabled)
- agents adapters (v6 + v7) project telemetry metadata onto root-span
attributes: reserved keys -> cloudflare.agents.turn.*, userId -> user.id,
other scalars -> cloudflare.agents.metadata.{key}, objects dropped
- drain loops finalize the underlying model stream on early exit (in-stream
error break, stall abort, user abort) via a WeakMap finalizer calling
consumeStream — the SDK tees its base stream, so an abandoned tee branch
would otherwise leave the operation span open forever
- wrapModel skips middleware for gateway-style string model ids (the root
span still carries the model)
Verified against the pinned ai@6.0.208 and fixed:
- stream observation now unwraps the SDK's {part} baseStream envelope —
previously real spans missed usage, finish reasons, errors, and aborts
(only look-alike test fixtures passed); added real-SDK integration tests
(actual streamText + MockLanguageModelV3) covering envelope unwrapping,
in-band error/abort parts, tool call ids, and time-to-first-chunk
- removed the eager result-getter 'safeguard': steps/totalUsage/finishReason
getters call consumeStream(), so touching them started hidden stream
consumption at wrap time; added a laziness regression test
- untraced fast path: when an invocation is not traced the wrapper calls the
original operation with the original params — no tool wrapping, no model
middleware, no stream patching (AgentSpan gains readonly isTraced)
- main agents entry no longer initializes tracing: diagnostics-channel events
moved to observability/events.ts; the public barrel composes events+tracing
- provider doStream now runs inside the chat span's activation so provider
work nests under it; stream patching fails open on unknown result shapes
- extractors read the public result shapes (inputTokenDetails/
outputTokenDetails, response.modelId, deprecated flat fields) and string
gateway model ids
- think: agents peer floor raised to >=0.18.0; the early-exit stream drain is
idempotent (deleted before invocation) and rides ctx.waitUntil
- v7 tool spans keyed by callId:toolCallId (concurrent id reuse); operation
wrappers cached for stable identity; tracer attribute writes fail-safe;
cloudflare.agents.operation.id renamed to .operation.name (values are names)
- untraced calls no longer compute the span spec: roots open with only the semconv name (agent name via direct property reads) and empty attributes; the full spec — metadata enumeration, request fields, context allowlists — is computed after the isTraced check and written through an internal writeSpanAttributes seam, so caller getters/proxies are never enumerated on untraced calls - think drains the model stream only on early exits (break or throw), via a natural-exhaustion flag — consumeStream is not a no-op (it tees baseStream and traverses the buffered branch), so draining every call was per-inference overhead; a thrown exit (stall watchdog) still drains - the finalizer runs exactly once: the drain promise is created before ctx.waitUntil, so a missing/throwing waitUntil cannot start a second tee consumer - async-generator tool bodies are re-entered into the tool span's async context via AsyncLocalStorage.snapshot() on every pull, so spans created inside the body parent under execute_tool (verified in workerd) - extractors: provider response-metadata stream parts populate response id/model on chat spans; v7 reads public usage detail shapes (inputTokenDetails/outputTokenDetails + deprecated flat fields) and prefers the served response.modelId over the requested event.modelId
The round-2 manual iterator.next() loop dropped for-await's automatic return() forwarding: a consumer breaking while the wrapper was suspended at yield closed the span but never ran the tool generator's own finally blocks. The wrapper now tracks exhaustion and, on early termination, forwards iterator.return() inside the tool span's context before finishing the span. Regression test: consumer breaks after the first yield; the tool generator's cleanup runs (and a span opened in that cleanup parents under execute_tool).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cloudflare-native tracing for AI agents: spans built on the Workers runtime's
tracingAPI (cloudflare:workers), named and attributed to the OpenTelemetry GenAI semantic conventions, flowing to Workers Observability with zero dependencies. No OTel SDK, no exporter, no collector — traces show up in the dash next to your fetch/DO/KV spans.Instrumenting the AI SDK
Enable traces on the Worker:
AI SDK v6 — wrap the namespace once, use it as normal:
Every call produces one semconv-shaped trace:
AI SDK v7 — register the telemetry lifecycle adapter instead of wrapping:
Same spans, driven by the SDK's telemetry callbacks, correlated by
cloudflare.agents.call.id/gen_ai.tool.call.id.Stream spans stay open until the stream is consumed, cancelled, errors, or is returned early — an aborted stream closes as canceled, not as a false success. Streaming tools (async-generator
execute) keep their span open until the iterable is drained (bodies run inside the tool span's async context; early termination still runs the generator's own cleanup), so tool durations are real. Untraced invocations take a pristine fast path: the original operation gets the original params — no tool wrapping, no model middleware, no stream patching.Think: traced out of the box
Think agents emit this exact trace tree per turn with zero configuration and zero new API surface. The turn's
streamTextcall is theinvoke_agentroot span — named after the agent class, carrying agent/conversation identity plus turn attributes (cloudflare.agents.turn.request_id,.trigger,.admission,.channel,.continuation,.generation) — with inference and tool calls as its only children. No opt-in flag, no setup, nothing exported; on runtimes without thetracingAPI the tracer is a no-op.How it works internally: Think merges its identity and current-turn metadata into
experimental_telemetry.metadataat the call site (caller-provided metadata wins, and still flows to the AI SDK's own telemetry when enabled), and the wrapper projects those onto root-span attributes — reserved keys tocloudflare.agents.turn.*,userIdto semconvuser.id, any other scalar tocloudflare.agents.metadata.{key}. Drain loops also finalize the underlying model stream on early exit (in-stream error, stall abort, user abort) so operation spans close instead of leaking — the SDK tees its base stream, and an abandoned tee branch would otherwise leave the span open forever.Schema
Span names follow the semconv formula with a bare-operation fallback past 64 UTF-8 bytes. Query on
gen_ai.operation.name, never the span name.gen_ai.operation.nameinvoke_agent {agent}invoke_agentchat {model}chatgen_ai.response.id/model,gen_ai.response.time_to_first_chunkexecute_tool {tool}execute_toolgen_ai.tool.name,gen_ai.tool.call.id, real execution durationgen_ai.*) wherever a home exists; vendor extensions undercloudflare.agents.*— never bare keys, neverai.*(that's the Vercel AI SDK's namespace; squatting it would fake compatibility we don't have).otel.status_code: "ERROR"+error.type(the spec-defined status encoding for status-less backends). Cancellations:cloudflare.agents.canceled: true, status untouched — aborts are not errors.Public surface (kept deliberately small)
agents/observability:tracer+ typesAgentTracer/AgentSpan/TraceAttributes/TraceAttributeValue.agents/observability/ai:wrapAISDK,createAISDKTelemetry. Everything else — span builders, attribute constants, theSpanRuntimeseam — is private, so when the runtime gains native OTel support we can converge behind the facade without a breaking change (names were chosen to avoid@opentelemetry/apicollisions; ouropenSpanis not OTel'sstartSpan, which creates without activating).Provenance
Ported from @msmps's
feat/ai-tracingbranch — commits carryCo-authored-bycredit — then folded into theagentspackage and aligned with the GenAI semantic conventions.Testing
user.id, scalar passthrough, object dropping, identity consumption)npm run checkgreen across all 114 projects