Forward termination to steps when a user cancels a resumable run by DRACULA1729 · Pull Request #33925 · dagster-io/dagster

DRACULA1729 · 2026-06-12T17:35:04Z

Summary & Motivation

When run monitoring is enabled with maxResumeRunAttempts > 0, cancelling a run from the UI left the running step going. The logs showed:

Executor received termination signal, not forwarding to steps because run will be resumed

The step-delegating executor decides whether to forward a termination signal by calling run_will_resume, but that method only looked at whether monitoring was enabled and whether resume attempts were left — never the run status. So a user cancel was indistinguishable from a pod crash, and the signal got swallowed.

A cancel puts the run into CANCELING (or CANCELED on force-terminate) before the daemon sends SIGTERM, and monitor_started_run only ever resumes runs still in STARTED. So those statuses were never actually resumable. run_will_resume now returns False for them, which is the same status check _resume_from_failure/the execute_run finally block already does before falling back to run_will_resume.

Test Plan

Added two unit tests in test_monitoring_daemon.py:

a STARTED run with attempts remaining is resumable, but stops being resumable once it transitions to CANCELING
a CANCELED run is not resumable

Full test_monitoring_daemon.py passes (10 tests); ruff clean.

With run monitoring set up and maxResumeRunAttempts > 0, the step-delegating executor treated every termination signal as a pending resume and skipped forwarding it to in-flight steps. A user-initiated cancellation got swallowed the same way a pod crash would, so the running step never stopped. run_will_resume now returns False once a run is CANCELING or CANCELED. The daemon only ever resumes runs that are still in STARTED, so those two statuses were never actually resumable in the first place. Fixes dagster-io#33923.

greptile-apps · 2026-06-12T17:38:47Z

Greptile Summary

This PR fixes a bug where cancelling a resumable run from the UI left in-flight steps running indefinitely. The root cause was that run_will_resume only checked monitoring enablement and remaining attempt count, making a user-initiated cancel indistinguishable from a pod crash, so the step-delegating executor suppressed SIGTERM forwarding.

run_will_resume now fetches the run record and returns False for CANCELING and CANCELED statuses, since the monitoring daemon only ever resumes runs in STARTED status — these two states can never be picked up for resumption.
Two unit tests are added to test_monitoring_daemon.py verifying that a STARTED run with attempts remaining is resumable, that it stops being resumable once it transitions to CANCELING, and that a CANCELED run is not resumable.

Confidence Score: 5/5

Safe to merge — the change is a narrow, targeted guard in run_will_resume with no side effects on existing crash-recovery behaviour.

The fix correctly identifies the two statuses (CANCELING and CANCELED) that the monitoring daemon never actually resumes, and the new early-return precisely mirrors the status check already performed elsewhere in the execute_run finally block. The additional get_run_by_id call is well within acceptable overhead, the existing module-level import pattern is followed, and the two new unit tests exercise both affected branches end-to-end using the real instance fixture.

No files require special attention.

Important Files Changed

Filename	Overview
python_modules/dagster/dagster/_core/instance/methods/run_launcher_methods.py	Adds CANCELING/CANCELED status guard to run_will_resume so user-initiated cancellations are not treated as crash-recovery scenarios, correctly forwarding SIGTERM to in-flight steps.
python_modules/dagster/dagster_tests/daemon_tests/test_monitoring_daemon.py	Adds two focused unit tests covering the CANCELING and CANCELED status branches of the new guard in run_will_resume; both tests correctly use the existing instance fixture with max_resume_run_attempts=3.

Sequence Diagram

sequenceDiagram
    participant UI as User (UI)
    participant Daemon as Monitoring Daemon
    participant Executor as Step-Delegating Executor
    participant Steps as In-flight Steps

    UI->>Daemon: Cancel run request
    Daemon->>Daemon: Set run status → CANCELING
    Daemon->>Executor: Send SIGTERM to pod

    Executor->>Executor: check_for_interrupts() → True
    Executor->>Executor: run_will_resume(run_id)
    Note over Executor: get_run_by_id → status=CANCELING
    Executor-->>Executor: return False (was: True before fix)

    alt Before fix (status check missing)
        Executor->>Executor: Log not forwarding, run will be resumed
        Steps->>Steps: Continue running indefinitely
    else "After fix status == CANCELING → False"
        Executor->>Steps: Forward SIGTERM (terminate_step)
        Steps->>Steps: Terminate cleanly
    end

_{Reviews (1): Last reviewed commit: "Forward termination to steps when a user..." | Re-trigger Greptile}

DRACULA1729 · 2026-06-12T17:45:28Z

@gibsondan mind taking a look when you get a chance? It's in the run-monitoring resume path — run_will_resume wasn't checking run status, so a user cancel got treated like a crash and the step kept running. Small fix, tests included.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward termination to steps when a user cancels a resumable run#33925

Forward termination to steps when a user cancels a resumable run#33925
DRACULA1729 wants to merge 1 commit into
dagster-io:masterfrom
DRACULA1729:fix/forward-termination-on-user-cancel-with-resume

DRACULA1729 commented Jun 12, 2026

Uh oh!

greptile-apps Bot commented Jun 12, 2026

Important Files Changed

Uh oh!

DRACULA1729 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DRACULA1729 commented Jun 12, 2026

Summary & Motivation

Test Plan

Uh oh!

greptile-apps Bot commented Jun 12, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

DRACULA1729 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant