Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/codemode-runtime-retries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@cloudflare/codemode": patch
---

Add durable execution retries to `runtime.tool()`. Connectors can throw `RetryableError` with an optional delay; by default the runtime makes three total attempts, honors that delay or uses bounded exponential backoff, and can be customized or disabled with `retry`. Failed passes restart under the same execution id, replaying applied calls from the log and re-executing only the failure boundary. Dynamic-worker timeouts are surfaced as structured failures but are not retried by default, so applications can conservatively decide which executions are safe to retry. Attempt fencing prevents calls or results from a superseded timed-out sandbox from mutating the replay log, and connector execute contexts receive a pass-scoped `AbortSignal` for cooperatively cancelling old work before the runtime moves on.
13 changes: 8 additions & 5 deletions docs/codemode/connectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,10 @@ export class MyConnector extends CodemodeConnector<Env> {
### Each tool

```ts
type ToolExecuteContext = { executionId: string };
type ToolExecuteContext = {
executionId: string;
signal?: AbortSignal;
};

type ConnectorTool = {
description?: string;
Expand All @@ -79,7 +82,7 @@ type ConnectorTool = {
};
```

`execute`/`revert` receive an optional `ctx` carrying the `executionId` of the run they belong to. The id is stable across a run's pause/resume passes, so it's the key to use for any resource scoped to the whole execution (see [Per-execution resources](#per-execution-resources)).
`execute`/`revert` receive an optional `ctx` carrying the stable `executionId` and an operation-scoped `AbortSignal`. Both fields remain optional for compatibility with AI SDK toolsets and direct connector tests; the Code Mode runtime supplies both on real calls. The runtime aborts the signal when the pass or rollback operation ends. Connectors should pass it to cancellable I/O such as `fetch(..., { signal: ctx?.signal })`. Cancellation is cooperative and cannot roll back an operation already committed by a remote service.

`requiresApproval: true` pauses the run for [approval](./approvals.md). `revert` enables [rollback](./runtime.md#rollback). Everything else executes immediately and is recorded in the durable log.

Expand Down Expand Up @@ -109,7 +112,7 @@ The proxy tool talks to connectors over Workers RPC. The base class derives this

Some connectors own a resource that must live for the lifetime of one run — a browser/CDP session, a database transaction, a temp workspace. Two pieces of the contract make this work:

1. **`execute(args, ctx)`** receives the `executionId`. Use it to lazily acquire (or reconnect to) the resource on first use, keyed by that id. Because the id is stable across pause/resume, the resource is addressable even after a run pauses for approval and resumes in a later Worker invocation. `ctx` is typed optional (so AI SDK toolsets stay shape-compatible), so read it as `ctx?.executionId` — the runtime always provides it on a real call.
1. **`execute(args, ctx)`** receives the stable `executionId` plus a pass-scoped `signal`. Use the id to lazily acquire (or reconnect to) per-execution resources, and use the signal to stop per-pass work when the runtime moves on. Because the id is stable across pause/resume, the resource remains addressable in a later Worker invocation. `ctx` is typed optional (so AI SDK toolsets stay shape-compatible), but the runtime always provides it on a real call.
2. **`disposeExecution(executionId, status)`** is called when the run reaches a **terminal** state, so you can tear the resource down.

```ts
Expand All @@ -124,7 +127,7 @@ export class BrowserConnector extends CodemodeConnector<Env> {
description: "Open a URL in the run's browser session.",
execute: async ({ url }, ctx) => {
const session = await this.sessionFor(ctx?.executionId);
return session.goto(url);
return session.goto(url, { signal: ctx?.signal });
}
}
};
Expand Down Expand Up @@ -171,7 +174,7 @@ override async onPassEnd(executionId: string, _status: PassEndStatus) {

The same implementation rules as `disposeExecution` apply: idempotent, no instance memory, never throws.

> **AI SDK toolsets:** when `tools()` returns an AI SDK `ToolSet`, codemode passes `{ executionId }` as the tool's second `execute` argument — the slot the AI SDK uses for its own call options. Inside codemode those options aren't otherwise populated, but a tool authored against the AI SDK's `toolCallId`/`messages` won't receive them here.
> **AI SDK toolsets:** when `tools()` returns an AI SDK `ToolSet`, codemode passes `{ executionId, signal }` as the tool's second `execute` argument — the slot the AI SDK uses for its own call options. A tool authored against the AI SDK's `toolCallId`/`messages` won't receive them here.

## Replay policy

Expand Down
28 changes: 27 additions & 1 deletion docs/codemode/runtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ const runtime = createCodemodeRuntime({

| Handle method | Purpose |
| ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `runtime.tool(options?)` | The single model-facing AI SDK tool, `codemode({ code })` |
| `runtime.tool(options?)` | The framework-independent model-facing tool, `codemode({ code })` |
| `runtime.pending(executionId?)` | Actions awaiting approval — drives approval UIs; no id aggregates all paused runs |
| `runtime.approve({ executionId })` | Approve the pending action and continue via replay |
| `runtime.reject({ seq, executionId })` | Reject a pending action; ends the execution. Returns `false` if it was a no-op (action no longer pending — approved or rejected elsewhere) |
Expand Down Expand Up @@ -149,6 +149,32 @@ A tool can opt out of result recording with [`replay: "reexecute"`](./connectors

Any single recorded value (a call's arguments, a recorded result, the final result) is capped at 1 MB serialized (`MAX_DURABLE_VALUE_BYTES`). Truncating a logged value is never an option — replay would feed resumed code corrupted data — so an oversized argument or call result **fails the run** with a model-actionable error suggesting the data be written to a file/workspace and passed by reference. An oversized **final** result does not fail the run (replay never needs it): the run completes, the model receives the real value, and the audit trail stores a placeholder note.

## Retrying failed passes

A connector requests a retry by throwing `RetryableError`. By default the runtime makes three total attempts, honoring `retryAfterMs` when present and otherwise using bounded exponential backoff (500ms, 1s, up to 10s). A retry keeps the same execution id and re-runs the stored code: applied calls replay from the durable log, while the call left `executing` at the failure boundary executes again. No configuration is needed for this default.

```ts
const runtime = createCodemodeRuntime({ ctx, executor, connectors });
```

Customize the policy when needed, or pass `retry: false` to disable automatic retries:

```ts
const runtime = createCodemodeRuntime({
ctx,
executor,
connectors,
retry: {
maxAttempts: 4,
shouldRetry: ({ failure }) => failure.kind === "retryable"
}
});
```

The error message and optional `retryAfterMs` cross the connector RPC and sandbox boundaries as structured metadata. Dynamic-worker timeouts also arrive as `failure.kind === "timeout"`, but are not retried by default: an operation may have succeeded remotely before its response was lost, so the application must decide whether that boundary is safe to repeat.

Before retry policy or delay callbacks run, the runtime advances a durable attempt fence. Late calls and results from the superseded sandbox are inert and cannot overwrite the newer pass.

## Rollback

Rollback walks the log backward and calls the `revert` of **every** applied action that has one — independent of `requiresApproval`. A non-approval write with a `revert` is still undone; an approval-gated action without a `revert` is not. `revert` (via `revertAction`) returns whether it actually reverted, and the runtime marks only those entries `reverted`:
Expand Down
43 changes: 43 additions & 0 deletions packages/codemode/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,49 @@ The `tool(name, t)` decoration hook adjusts tools you didn't author inline (used

The agent drives approvals through the runtime: `runtime.pending()`, `runtime.approve({ executionId })`, `runtime.reject({ seq, executionId })`, `runtime.rollback({ executionId })` (see [docs/codemode/approvals.md](../../docs/codemode/approvals.md)).

### Durable retries

A runtime automatically retries `RetryableError` under the same execution id, up to three total attempts. Applied connector calls replay from the durable log; the call at the failure boundary executes again. `retryAfterMs` is honored when present, otherwise retries use bounded exponential backoff (500ms, 1s, up to 10s).

```ts
import { RetryableError } from "@cloudflare/codemode";

// Connector code can signal a transient, safe-to-repeat failure.
throw new RetryableError("Rate limited", { retryAfterMs: 2_000 });

const runtime = createCodemodeRuntime({
ctx,
executor,
connectors
// No retry configuration needed for RetryableError.
});

// Customize the policy when needed — for example, to opt safe reads into
// timeout retries. Timeout retries are off by default because a timed-out
// mutation may already have succeeded remotely.
const customized = createCodemodeRuntime({
ctx,
executor,
connectors,
retry: {
maxAttempts: 4,
shouldRetry: ({ failure, execution }) =>
failure.kind === "retryable" ||
(failure.kind === "timeout" && timeoutIsSafe(execution))
}
});

// Or disable automatic retries entirely.
const noRetries = createCodemodeRuntime({
ctx,
executor,
connectors,
retry: false
});
```

Each pass has a durable attempt fence. If a timed-out old sandbox finishes after its retry started, its later calls and results are ignored rather than corrupting the replay log. Connector `execute` callbacks also receive a pass-scoped `AbortSignal` through their context; pass it to cancellable I/O so the old operation stops when the runtime moves on. Cancellation is cooperative and cannot roll back a remote write that already committed.

### Snippets

Snippets are durable, addressable saved scripts. The model writes and runs scripts; the developer promotes the ones worth keeping (`runtime.saveSnippet`), and the model reuses them (`codemode.run`). No authoring step, no skill-source interface.
Expand Down
12 changes: 7 additions & 5 deletions packages/codemode/src/connectors/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,16 @@ export type ToolAnnotations = {
// ---------------------------------------------------------------------------

/**
* Passed to a tool's `execute`/`revert` so a connector knows which codemode
* execution the call belongs to. The id is stable across a run's pause/resume
* passes, so it's the right key for a per-execution resource (e.g. a browser
* session) that must survive a pause.
* Passed to a tool's `execute` so a connector knows which codemode execution
* and pass the call belongs to. The execution id is stable across pause/resume;
* the signal is scoped to one pass and aborts when that pass stops making
* progress (completion, pause, error, timeout, or retry).
*/
export type ToolExecuteContext = {
/** The codemode execution this call belongs to. Stable across pause/resume. */
executionId: string;
/** Cooperative cancellation for work still running when this pass ends. */
signal?: AbortSignal;
};

/**
Expand Down Expand Up @@ -57,7 +59,7 @@ export type ExecutionEndStatus =
* resources (an open socket, a lease) should be released even though
* per-execution resources (a session) must survive.
*/
export type PassEndStatus = ExecutionEndStatus | "paused";
export type PassEndStatus = ExecutionEndStatus | "paused" | "retrying";

// ---------------------------------------------------------------------------
// Connector description — returned by describe() RPC.
Expand Down
21 changes: 16 additions & 5 deletions packages/codemode/src/executor-types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,22 @@
* code in a sandbox (DynamicWorkerExecutor, IframeSandboxExecutor, ...).
*/

export interface ExecuteResult {
result: unknown;
error?: string;
logs?: string[];
}
import type { ExecuteFailure } from "./retry";

export type ExecuteResult =
| {
result: unknown;
error?: never;
failure?: never;
logs?: string[];
}
| {
result?: undefined;
error: string;
/** Machine-readable failure metadata when the executor can classify it. */
failure?: ExecuteFailure;
logs?: string[];
};

/**
* Internal resolved form of a tool provider, ready for execution.
Expand Down
20 changes: 16 additions & 4 deletions packages/codemode/src/executor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,11 @@ export class DynamicWorkerExecutor implements Executor {
` if (__r && typeof __r === "object") {\n` +
` if (__r.${CONNECTOR_CONTROL_KEY} === "pause") throw new Error("${PAUSE_SENTINEL_LITERAL}");\n` +
` if (__r.${CONNECTOR_CONTROL_KEY} === "error") throw new Error(String(__r.message));\n` +
` if (__r.${CONNECTOR_CONTROL_KEY} === "retryable") {\n` +
` const __e = new Error(String(__r.message));\n` +
` __e.__codemode_failure__ = { kind: "retryable", message: String(__r.message), retryAfterMs: __r.retryAfterMs };\n` +
` throw __e;\n` +
` }\n` +
` }\n` +
` return __r;\n` +
` };\n` +
Expand Down Expand Up @@ -400,13 +405,13 @@ export class DynamicWorkerExecutor implements Executor {
.concat([normalized])
.concat([
")(),",
' new Promise((_, reject) => setTimeout(() => reject(new Error("Execution timed out")), ' +
' new Promise((_, reject) => setTimeout(() => { const e = new Error("Execution timed out"); e.__codemode_failure__ = { kind: "timeout", message: "Execution timed out" }; reject(e); }, ' +
timeoutMs +
"))",
" ]);",
" return { result, logs: __logs };",
" } catch (err) {",
" return { result: undefined, error: err.message, logs: __logs };",
" return { result: undefined, error: err.message, failure: err.__codemode_failure__, logs: __logs };",
" }",
" }",
"}"
Expand Down Expand Up @@ -472,13 +477,20 @@ export class DynamicWorkerExecutor implements Executor {
): Promise<{
result: unknown;
error?: string;
failure?: import("./retry").ExecuteFailure;
logs?: string[];
}>;
};
const response = await entrypoint.evaluate(dispatchers, connectorBindings);

if (response.error) {
return { result: undefined, error: response.error, logs: response.logs };
if (response.error || response.failure) {
return {
result: undefined,
error:
response.error ?? response.failure?.message ?? "Execution failed",
failure: response.failure,
logs: response.logs
};
}

return { result: response.result, logs: response.logs };
Expand Down
7 changes: 7 additions & 0 deletions packages/codemode/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,13 @@ export {
type PendingAction
} from "./runtime";
export { type Snippet, type SaveSnippetOptions } from "./snippet";
export {
RetryableError,
type ExecuteFailure,
type CodemodeRetryContext,
type CodemodeRetryOptions,
type CodemodeRetryPolicy
} from "./retry";
export {
createCodemodeRuntime,
type CreateCodemodeRuntimeOptions,
Expand Down
Loading
Loading