Skip to content

Support workflow terminate rollback#14465

Open
vaishnav-mk wants to merge 6 commits into
mainfrom
vaish/terminate-rollback
Open

Support workflow terminate rollback#14465
vaishnav-mk wants to merge 6 commits into
mainfrom
vaish/terminate-rollback

Conversation

@vaishnav-mk

@vaishnav-mk vaishnav-mk commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Adds terminate rollback support across Workflows local tooling and the remote Wrangler terminate command.

This wires rollback: true through the terminate path:

  • @cloudflare/workflows-shared binding and local engine
  • Miniflare wrapped Workflows binding
  • Local Explorer API/OpenAPI schema
  • Wrangler workflows instances terminate --rollback
  • Wrangler workflows instances terminate --local --rollback

The production Workflows API already accepts terminate rollback; this PR adds the SDK/Wrangler client surfaces and fixes local rollback recovery so rollback can run after local engine restart/eviction.

Rollback is only valid for terminate and is only serialized when explicitly set to true.


A picture of a cute animal (not mandatory, but encouraged)

@changeset-bot

changeset-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 6c313a6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages
Name Type
@cloudflare/workflows-shared Minor
wrangler Minor
miniflare Minor
@cloudflare/vitest-pool-workers Patch
@cloudflare/vite-plugin Patch
@cloudflare/deploy-helpers Patch
@cloudflare/pages-shared Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

✅ All changesets look good

@vaishnav-mk vaishnav-mk force-pushed the vaish/terminate-rollback branch 2 times, most recently from 4ccbf75 to 01fd44d Compare June 29, 2026 06:36
@pkg-pr-new

pkg-pr-new Bot commented Jun 29, 2026

Copy link
Copy Markdown
@cloudflare/autoconfig

npm i https://pkg.pr.new/@cloudflare/autoconfig@14465

create-cloudflare

npm i https://pkg.pr.new/create-cloudflare@14465

@cloudflare/deploy-helpers

npm i https://pkg.pr.new/@cloudflare/deploy-helpers@14465

@cloudflare/kv-asset-handler

npm i https://pkg.pr.new/@cloudflare/kv-asset-handler@14465

miniflare

npm i https://pkg.pr.new/miniflare@14465

@cloudflare/pages-shared

npm i https://pkg.pr.new/@cloudflare/pages-shared@14465

@cloudflare/unenv-preset

npm i https://pkg.pr.new/@cloudflare/unenv-preset@14465

@cloudflare/vite-plugin

npm i https://pkg.pr.new/@cloudflare/vite-plugin@14465

@cloudflare/vitest-pool-workers

npm i https://pkg.pr.new/@cloudflare/vitest-pool-workers@14465

@cloudflare/workers-auth

npm i https://pkg.pr.new/@cloudflare/workers-auth@14465

@cloudflare/workers-editor-shared

npm i https://pkg.pr.new/@cloudflare/workers-editor-shared@14465

@cloudflare/workers-utils

npm i https://pkg.pr.new/@cloudflare/workers-utils@14465

wrangler

npm i https://pkg.pr.new/wrangler@14465

commit: 6c313a6

@vaishnav-mk vaishnav-mk force-pushed the vaish/terminate-rollback branch 5 times, most recently from f9b1b1a to 9ed9edf Compare June 29, 2026 06:59
@ask-bonk

ask-bonk Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@vaishnav-mk Bonk workflow was cancelled.

View workflow run · To retry, trigger Bonk again.

@vaishnav-mk vaishnav-mk force-pushed the vaish/terminate-rollback branch 3 times, most recently from 50bf663 to 2c0a39a Compare June 29, 2026 07:09
Comment thread packages/miniflare/src/workers/local-explorer/openapi.local.json Outdated
Comment thread packages/miniflare/src/workers/workflows/wrapped-binding.worker.ts Outdated
Comment thread packages/workflows-shared/src/binding.ts Outdated
@vaishnav-mk vaishnav-mk force-pushed the vaish/terminate-rollback branch 2 times, most recently from 4bafb21 to ef038d9 Compare June 29, 2026 10:23
@vaishnav-mk vaishnav-mk marked this pull request as ready for review June 29, 2026 10:55
@workers-devprod workers-devprod requested review from a team and james-elicx and removed request for a team June 29, 2026 10:55
@workers-devprod

workers-devprod commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Codeowners approval required for this PR:

  • @cloudflare/wrangler
  • ✅ @cloudflare/workflows
Show detailed file reviewers
  • .changeset/workflows-terminate-rollback.md: [@cloudflare/wrangler]
  • packages/miniflare/scripts/openapi-filter-config.ts: [@cloudflare/wrangler]
  • packages/miniflare/src/workers/local-explorer/generated/types.gen.ts: [@cloudflare/wrangler]
  • packages/miniflare/src/workers/local-explorer/generated/zod.gen.ts: [@cloudflare/wrangler]
  • packages/miniflare/src/workers/local-explorer/openapi.local.json: [@cloudflare/wrangler]
  • packages/miniflare/src/workers/local-explorer/resources/workflows.ts: [@cloudflare/wrangler]
  • packages/miniflare/src/workers/workflows/wrapped-binding.worker.ts: [@cloudflare/wrangler]
  • packages/workflows-shared/src/binding.ts: [@cloudflare/wrangler]
  • packages/workflows-shared/src/context.ts: [@cloudflare/wrangler]
  • packages/workflows-shared/src/engine.ts: [@cloudflare/wrangler]
  • packages/workflows-shared/tests/engine.test.ts: [@cloudflare/wrangler]
  • packages/wrangler/src/tests/workflows.test.ts: [@cloudflare/wrangler]
  • packages/wrangler/src/workflows/commands/instances/terminate.ts: [@cloudflare/wrangler]
  • packages/wrangler/src/workflows/local.ts: [@cloudflare/wrangler]
  • packages/wrangler/src/workflows/utils.ts: [@cloudflare/wrangler]

devin-ai-integration[bot]

This comment was marked as resolved.

@pombosilva pombosilva self-requested a review June 29, 2026 11:29
@vaishnav-mk vaishnav-mk force-pushed the vaish/terminate-rollback branch from ef038d9 to b1be7f1 Compare June 30, 2026 11:18
@vaishnav-mk

Copy link
Copy Markdown
Contributor Author

/bonk review

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Support workflow terminate rollback

Overall this is a well-structured change. The replay-to-rebuild-registry approach correctly addresses the "empty registry after DO restart/pause" limitation that Devin originally flagged, and the persisted hasRollback metadata is a clean way to know which cached steps to replay. Tests cover the live-registry, replayed-registry, and paused-registry cases. A few things worth addressing before merge.

Blocking / needs discussion

1. The remote (production) API path also sends rollback: true — is this intended/supported?

packages/wrangler/src/workflows/utils.ts:131 adds rollback: true to the body of the remote PATCH /accounts/.../instances/:id/status call. The PR description scopes this to "local Workflows tooling," and the engine implementation lives entirely in @cloudflare/workflows-shared (local). If the production Workflows REST API doesn't yet accept a rollback field, this either silently no-ops or could be rejected — yet the CLI unconditionally prints 🥷 The instance "..." was terminated successfully (terminate.ts:65). If the remote API isn't ready, consider gating --rollback to --local only (and erroring otherwise), so users don't get a false success message for a rollback that never ran. If the remote API is ready, the PR description should say so.

Non-blocking

2. No user-facing signal when rollback runs vs. is skipped.

This was Devin's original concern. The replay path mitigates the empty-registry case, but if getEligibleRollbackSteps() returns nothing (no eligible steps, or malformed metadata), rollbacks are silently skipped while the CLI still reports plain success. Consider surfacing in the success log whether rollbacks executed (e.g. count of rollback steps run), so the behavior is observable.

3. replayRollbackRegistry re-runs the user workflow — confirm side-effect safety for uncached steps.

engine.ts:983 calls USER_WORKFLOW.run(...) in replay phase. Cached step.do calls correctly short-circuit to their stored value (context.ts:438-455) and sleep/sleepUntil/waitForEvent return early in replay phase, so a paused/sleeping instance halts replay safely. But if an instance is terminated while a new, uncached step.do would be the next thing the user code reaches before any sleep/wait, that closure would actually execute with real side effects during replay. In practice terminate targets running/paused instances, so this is unlikely — but a comment documenting this assumption (replay relies on all rollback-eligible steps already being cached and execution halting at the next suspension point) would help future maintainers.

4. Swallowed errors in replayRollbackRegistry and userTriggeredTerminate.

engine.ts:987 catches and discards all replay errors, and engine.ts:1015 logs rollback failures only to console.error. That matches the production "best-effort" rollback model, but a malformed user workflow throwing during replay will silently produce zero rollbacks. At minimum, consider a debug-level log so local users can diagnose "why didn't my rollback run?"

Minor / nits

  • engine.ts:316-319 in setRollbackPhase/rollbackPhase use a string-literal union ("replay" | "rollback" | undefined) repeated in several signatures. A small exported type alias would reduce duplication.
  • resources/workflows.ts:923 still casts handle as unknown as WorkflowHandle to call terminate(options). Same TODO as the wrapped-binding.worker.ts cast — worth a single shared note/tracking issue referencing the @cloudflare/workers-types terminate-options gap so both casts get removed together.
  • getEligibleRollbackSteps reads all states rows twice and builds two sets; fine for local volumes, just noting it scans the full log table per terminate.

Tests / changeset

  • Changeset (minor for workflows-shared/wrangler/miniflare) and wording look correct.
  • Wrangler tests now assert the exact request body for both local and remote --rollback, which is good. If point (1) results in gating --rollback to local-only, the remote test should be updated accordingly.

@ask-bonk

ask-bonk Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

I reviewed PR #14465 and posted a review. Summary of what I found:

Strengths

  • The replay-to-rebuild-registry approach (replayRollbackRegistry + persisted hasRollback metadata) cleanly solves the empty-registry-after-DO-restart limitation Devin flagged. Cached steps short-circuit and sleep/waitForEvent return early in replay phase, so the design is sound.
  • Tests cover live, replayed, and paused registry cases; changeset and request-body assertions are correct.

Main concern (blocking/discussion)

  1. The remote production API path (utils.ts:131) also sends rollback: true, but the engine implementation is local-only. If the production REST API doesn't support rollback, the CLI still prints "terminated successfully" — a false signal. Suggested gating --rollback to --local, or confirming the remote API is ready.

Non-blocking
2. No user-facing signal whether rollbacks actually ran vs. were silently skipped.
3. replayRollbackRegistry re-runs user code — safe for cached/suspended instances, but worth a comment documenting the assumption that uncached steps won't execute.
4. Swallowed errors in replay/rollback could hide "why didn't my rollback run" — suggest a debug log.

Nits: repeated phase string-union (extract a type), the remaining as unknown as WorkflowHandle cast mirroring the wrapped-binding.worker.ts TODO, and the double full-table scan in getEligibleRollbackSteps.

github run

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment on lines +968 to +994
private async replayRollbackRegistry(
metadata: InstanceMetadata
): Promise<void> {
if (this.rollbackRegistry.size > 0) {
return;
}

const eligible = this.getEligibleRollbackSteps();
if (eligible.length === 0) {
return;
}

this.rollbackEligibleCacheKeys = new Set(eligible);
const stubStep = this.createRollbackContext();
this.setRollbackPhase("replay");
try {
await this.env.USER_WORKFLOW.run(
metadata.event,
stubStep as unknown as WorkflowStep
);
} catch {
// Match the production engine: replay may stop on normal workflow control
// flow; rollback execution uses whatever handlers replay registered.
} finally {
this.setRollbackPhase(undefined);
this.rollbackEligibleCacheKeys = undefined;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Replay of failed steps won't re-register their rollback handlers

During replayRollbackRegistry, if a step previously FAILED (retries exhausted or NonRetryableError), its error is stored in DO storage. When replay hits that step in do() (packages/workflows-shared/src/context.ts:457-463), it throws the cached error immediately WITHOUT calling #registerRollback. This means after a DO restart, terminating with { rollback: true } won't execute rollbacks for steps that originally failed (even though they had rollback handlers registered during the original execution). Steps that SUCCEEDED will replay correctly from cache and re-register their rollbacks. This limitation is acknowledged by the comment at packages/workflows-shared/src/engine.ts:988-989 ("replay may stop on normal workflow control flow") and may be acceptable for the local-dev use case, but it's a behavioral difference from in-memory rollback (where the registry is already populated).

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Untriaged

Development

Successfully merging this pull request may close these issues.

3 participants