Skip to content

[dagster-graphql] Tolerate deleted partitions in historical automation evaluations#33939

Open
Vamsi-klu wants to merge 1 commit into
dagster-io:masterfrom
Vamsi-klu:deleted-partition-automation-tab-33910
Open

[dagster-graphql] Tolerate deleted partitions in historical automation evaluations#33939
Vamsi-klu wants to merge 1 commit into
dagster-io:masterfrom
Vamsi-klu:deleted-partition-automation-tab-33910

Conversation

@Vamsi-klu

Copy link
Copy Markdown

Summary & Motivation

When a partition is manually removed (e.g. a dynamic partition is deleted), opening the Automation tab of a downstream asset raises a GraphQL error and the tab fails to load (#33910).

Historical automation-condition evaluation records store partition subsets that were captured at evaluation time. After a partition is deleted, materializing those stored subsets — subset.size and partition_key in subset.subset_value — raises DagsterInvalidInvocationError because the partition no longer exists in the partitions definition. The GraphQL resolvers in asset_condition_evaluations.py call those properties directly while building GrapheneAssetConditionEvaluation / per-partition nodes, so the error propagates out of the query and breaks the whole tab.

Fix

Wrap the two access patterns in small helpers that tolerate stale historical subsets:

  • _get_historical_subset_size(subset) returns subset.size, falling back to 0 if the subset can no longer be materialized.
  • _historical_subset_contains_partition(subset, key) returns subset.is_partitioned and key in subset.subset_value, falling back to False.

A deleted partition is therefore treated as "no longer true / not a candidate" instead of crashing the query. This is a no-op for all non-stale subsets — behaviour only changes on the error path that previously 500'd.

Test Plan

Added two regression tests in dagster_graphql_tests/graphql/test_asset_condition_evaluations.py that build a KeyRangesPartitionsSubset and patch its __len__ / __contains__ to raise DagsterInvalidInvocationError (the deleted-partition shape), then assert the records query and the per-partition query both resolve with numRequested/numTrue == 0 instead of erroring.

Verified locally: both tests pass on the sqlite GraphQL context variants (4 passed, 4 skipped — postgres/code-server variants skip on local Docker/gRPC constraints). ruff (pinned 0.15.15) check + format clean.

closes: #33910


Was generative AI tooling used to author this PR?

Yes — Claude Code (Opus 4.8). Generated-by: Claude Code (Opus 4.8)

@greptile-apps

greptile-apps Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds two small defensive helpers to the GraphQL schema layer that catch DagsterInvalidInvocationError when accessing .size or partition membership on historical automation-condition evaluation subsets that reference deleted partitions, preventing the Automation tab from 500-ing.

  • _get_historical_subset_size and _historical_subset_contains_partition are introduced and wired into all six call-sites in asset_condition_evaluations.py that previously accessed subset.size or partition_key in subset.subset_value directly.
  • Two regression tests are added that patch KeyRangesPartitionsSubset.__len__ / __contains__ to raise DagsterInvalidInvocationError and assert the query resolves cleanly with zero counts.

Confidence Score: 5/5

The change is narrowly scoped to the error-handling path for stale historical subsets; all non-stale call-sites are unaffected and the helpers introduce no new state.

Both helpers correctly catch the exception raised by get_partition_keys_in_range when a dynamic partition no longer exists (DagsterInvalidInvocationError), confirmed by reading partitions_definition.py line 109. All six call-sites in the resolver are updated consistently, and the regression tests exercise both the size-access and membership-check paths end-to-end.

No files require special attention.

Important Files Changed

Filename Overview
python_modules/dagster-graphql/dagster_graphql/schema/asset_condition_evaluations.py Adds two fallback helpers for stale historical partition subsets and replaces all direct .size / contains call-sites; logic is correct for the deleted-partition case.
python_modules/dagster-graphql/dagster_graphql_tests/graphql/test_asset_condition_evaluations.py Adds two regression tests covering the deleted-partition error path via mock; helper _get_key_range_condition_evaluation correctly builds a KeyRangesPartitionsSubset-backed evaluation record.

Reviews (2): Last reviewed commit: "[dagster-graphql] Tolerate stale partiti..." | Re-trigger Greptile

Comment on lines +393 to +397
def _get_historical_subset_size(subset: SerializableEntitySubset) -> int:
try:
return subset.size
except DagsterInvalidInvocationError:
return 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Narrow exception scope may miss the loading-context error

KeyRangesPartitionsSubset.__len__ and __contains__ are both decorated with @require_full_partition_loading_context, which calls check.invariant(...) and raises DagsterInvariantViolationError — not DagsterInvalidInvocationError — when no partition loading context is active. If a GraphQL code path ever reaches these helpers without a context (e.g. in a test that bypasses the normal request setup), the DagsterInvariantViolationError would propagate unhandled and re-expose the same 500 crash. In normal production paths the context is always set, so this is low risk today, but worth documenting or catching the broader DagsterError base class if hardening is desired.

Comment on lines +934 to +936
assert not results.errors
evaluation = results.data["assetConditionEvaluationForPartition"]
assert evaluation["evaluationNodes"][0]["status"] == "FALSE"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 "FALSE" status assertion may surprise future readers

When __contains__ is patched to throw for true_subset, the code falls through to check candidate_subset. Because candidate_subset is HistoricalAllPartitionsSubsetSentinel() — which is not a SerializableEntitySubset — the not isinstance(...) branch fires and yields FALSE. This is correct given the current logic, but it means a deleted partition shows up as FALSE (evaluated-and-rejected) rather than SKIPPED (not evaluated). A brief comment in the test explaining why FALSE (not SKIPPED) is expected would help future maintainers understand the assertion isn't a mistake.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@Vamsi-klu Vamsi-klu force-pushed the deleted-partition-automation-tab-33910 branch from acbbf02 to 5fed730 Compare June 20, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

manually removed partition causes a graphql error on downstream assets automation tab

1 participant