feat(sql): support read_parquet ignore_corrupt_files#7133
Conversation
Greptile SummaryThis PR wires
Confidence Score: 4/5Safe to merge — the Rust changes are minimal and correct, only wiring a pre-existing field through a previously unused path. The Rust changes are mechanical and low-risk. The one concern is in the test: tests/sql/test_sql_table_functions.py — the Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant User as SQL caller
participant Planner as SQLPlanner
participant TF as ReadParquetFunction
participant Builder as ParquetScanBuilder
participant Config as ParquetSourceConfig
participant Scan as GlobScanOperator
User->>Planner: "SELECT * FROM read_parquet(..., ignore_corrupt_files => true)"
Planner->>TF: plan(args)
TF->>Builder: "TryFrom SQLFunctionArguments: ignore_corrupt_files=true"
Builder->>Builder: finish()
Builder->>Config: "ParquetSourceConfig { ignore_corrupt_files: true }"
Config->>Scan: "GlobScanOperator::try_new(FileFormatConfig::Parquet(cfg))"
Scan-->>Builder: ScanOperatorRef
Builder-->>TF: LogicalPlanBuilder
TF-->>User: "DataFrame: .collect() skips corrupt files"
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant User as SQL caller
participant Planner as SQLPlanner
participant TF as ReadParquetFunction
participant Builder as ParquetScanBuilder
participant Config as ParquetSourceConfig
participant Scan as GlobScanOperator
User->>Planner: "SELECT * FROM read_parquet(..., ignore_corrupt_files => true)"
Planner->>TF: plan(args)
TF->>Builder: "TryFrom SQLFunctionArguments: ignore_corrupt_files=true"
Builder->>Builder: finish()
Builder->>Config: "ParquetSourceConfig { ignore_corrupt_files: true }"
Config->>Scan: "GlobScanOperator::try_new(FileFormatConfig::Parquet(cfg))"
Scan-->>Builder: ScanOperatorRef
Builder-->>TF: LogicalPlanBuilder
TF-->>User: "DataFrame: .collect() skips corrupt files"
Reviews (1): Last reviewed commit: "feat(sql): support read_parquet ignore_c..." | Re-trigger Greptile |
1a9b7d5 to
75becad
Compare
75becad to
e9a2c6c
Compare
33e51e3 to
56aaf14
Compare
…uet-ignore-corrupt-files
166bf18 to
647d58c
Compare
…uet-ignore-corrupt-files
647d58c to
20d9590
Compare
Changes Made
ignore_corrupt_filessupport to SQLread_parquet, forwarding the named argument into the Parquet scan config.Related Issues
Closes #7132