Skip to content

fix: apply recursive CTE column-list aliases to the static term#23098

Open
tomsanbear wants to merge 3 commits into
apache:mainfrom
tomsanbear:recursive-cte-column-list-alias
Open

fix: apply recursive CTE column-list aliases to the static term#23098
tomsanbear wants to merge 3 commits into
apache:mainfrom
tomsanbear:recursive-cte-column-list-alias

Conversation

@tomsanbear

Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

WITH RECURSIVE t(n) AS (...) failed to plan because the CTE's declared column-list names (the t(n) part) were never applied to the recursive working relation. They were applied (via apply_table_alias) only after the whole CTE plan was built, but the working table is derived from the static term's schema before that — so the self-reference couldn't resolve the declared names and planning failed with Schema error: No field named n. Valid fields are t."Int64(1)".. PostgreSQL and DuckDB accept the query; aliasing inside the static SELECT (SELECT 1 AS n) was the only workaround.

What changes are included in this PR?

Apply the column-list aliases to the static term inside recursive_cte(), before the work table is created, so the working relation and the self-reference carry the declared names. The caller now applies only the relation-name alias on the recursive path (the columns are already applied), avoiding a redundant projection on top of the RecursiveQuery node. The non-UNION fallback applies the aliases directly; non-recursive CTEs are unchanged. A column/alias-count mismatch is now reported at the static term — a clearer error than the previous "No field named …".

Are these changes tested?

Yes, added cte.slt cases for single- and multi-column column-list recursive CTEs (asserting the recursion produces the expected rows), UNION (DISTINCT), the arity-mismatch error, and an EXPLAIN locking the plan shape (no extra projection over RecursiveQuery).

Are there any user-facing changes?

WITH RECURSIVE t(n) AS (...) and multi-column column lists now plan and execute correctly, matching PostgreSQL/DuckDB. No API changes.

Note: this pull request was created together with AI tools (claude code), the full diff was reviewed by myself in full prior to submission

`WITH RECURSIVE t(n) AS (SELECT 1 UNION ALL SELECT n + 1 FROM t WHERE n < 10)`
failed to plan with `Schema error: No field named n. Valid fields are
t."Int64(1)".`. The CTE's declared column-list names were applied (via
`apply_table_alias`) only after the whole CTE plan was built, but the recursive
working relation is derived from the schema of the static term before that, so
the self-reference could never resolve the declared names.

Apply the column-list aliases to the static term inside `recursive_cte`, before
the work table is created, so the working relation and the self-reference expose
the declared names. The relation-name alias is still added by the caller; on the
recursive path the column re-alias is skipped to avoid a redundant projection on
top of the `RecursiveQuery` node. The non-UNION fallback applies the aliases
directly. Non-recursive CTEs are unchanged.

Adds sqllogictest coverage for single- and multi-column column-list recursive
CTEs, UNION (DISTINCT), and the column/alias-count mismatch error.
@github-actions

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-sql v54.0.0 (current)
       Built [  40.321s] (current)
     Parsing datafusion-sql v54.0.0 (current)
      Parsed [   0.031s] (current)
    Building datafusion-sql v54.0.0 (baseline)
       Built [  40.178s] (baseline)
     Parsing datafusion-sql v54.0.0 (baseline)
      Parsed [   0.034s] (baseline)
    Checking datafusion-sql v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.248s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure inherent_method_missing: pub method removed or renamed ---

Description:
A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/inherent_method_missing.ron

Failed in:
  RelationBuilder::nested_join, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/c519b008fd84f8f6bb5b5219ec0f7d2212d6fa60/datafusion/sql/src/unparser/ast.rs:508
  SelectBuilder::has_selection, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/c519b008fd84f8f6bb5b5219ec0f7d2212d6fa60/datafusion/sql/src/unparser/ast.rs:267

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  81.961s] datafusion-sql
    Building datafusion-sqllogictest v54.0.0 (current)
       Built [ 171.969s] (current)
     Parsing datafusion-sqllogictest v54.0.0 (current)
      Parsed [   0.024s] (current)
    Building datafusion-sqllogictest v54.0.0 (baseline)
       Built [ 171.948s] (baseline)
     Parsing datafusion-sqllogictest v54.0.0 (baseline)
      Parsed [   0.023s] (baseline)
    Checking datafusion-sqllogictest v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.086s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 346.879s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 25, 2026
kosiew
kosiew previously approved these changes Jun 25, 2026

@kosiew kosiew left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomsanbear
Thanks for the update! The changes look good to me. I just have one small suggestion that could help strengthen the test coverage.

# recursive CTE with a column-list alias (e.g. `t(n)`): the declared names must be
# applied to the static term so the recursive self-reference can resolve them
query I rowsort
WITH RECURSIVE t(n) AS (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement! One small suggestion: could we add a regression test with a quoted recursive CTE column-list alias, for example WITH RECURSIVE t("N") AS (...) SELECT "N" FROM t? I think it would be helpful to document that quoted and case-sensitive aliases are preserved in the recursive work table as well. This is not blocking since the implementation already goes through the existing alias normalization path.

@kosiew kosiew left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you investigate and resolve the CI error?

@kosiew kosiew dismissed their stale review June 25, 2026 11:45

CI error

@tomsanbear

Copy link
Copy Markdown
Author

@tomsanbear Thanks for the update! The changes look good to me. I just have one small suggestion that could help strengthen the test coverage.

For sure, thanks for the feedback! I'll have some time a bit later today and will push a fix to the branch

@github-actions github-actions Bot added the optimizer Optimizer rules label Jun 25, 2026
@tomsanbear tomsanbear requested a review from kosiew June 25, 2026 16:52
@tomsanbear

Copy link
Copy Markdown
Author

@kosiew i've updated the snapshot along with the new test you recommended 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Recursive CTE column-list alias t(n) is ignored, fails to plan with "No field named n"

2 participants