-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Handle nulls in type coercion of higher-order UDFs, map_extract, spark repeat #23071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -105,6 +105,11 @@ impl ScalarUDFImpl for MapExtract { | |
|
|
||
| fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { | ||
| let [map_type, _] = take_function_args(self.name(), arg_types)?; | ||
|
|
||
| if map_type.is_null() { | ||
| return Ok(DataType::Null); | ||
| } | ||
|
|
||
| let map_fields = get_map_entry_field(map_type)?; | ||
| Ok(DataType::List(Arc::new(Field::new_list_field( | ||
| map_fields.last().unwrap().data_type().clone(), | ||
|
|
@@ -123,6 +128,10 @@ impl ScalarUDFImpl for MapExtract { | |
| fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> { | ||
| let [map_type, _] = take_function_args(self.name(), arg_types)?; | ||
|
|
||
| if map_type.is_null() { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Null is accepted in coercion, but We should also add a test for this.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are right, thank you. The only tested function |
||
| return Ok(arg_types.to_vec()); | ||
| } | ||
|
|
||
| let field = get_map_entry_field(map_type)?; | ||
| Ok(vec![ | ||
| map_type.clone(), | ||
|
|
@@ -185,6 +194,7 @@ fn map_extract_inner(args: &[ArrayRef]) -> Result<ArrayRef> { | |
|
|
||
| let map_array = match map_arg.data_type() { | ||
| DataType::Map(_, _) => as_map_array(&map_arg)?, | ||
| DataType::Null => return Ok(Arc::clone(map_arg)), | ||
| _ => return exec_err!("The first argument in map_extract must be a map"), | ||
| }; | ||
|
|
||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code change teaches the Spark array-repeat function to accept a null count. But the test calls the two-argument repeat form, and in this test context that name resolves to the core string-repeat function, which already returned null for a null count long before this PR. Add the null-count case to the file that actually tests the array-repeat function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test builds the column expression and asserts that construction doesn't error. That does cover the issue, which is good. But the issue's real goal is a correct end-to-end result, and the test never resolves the lambda variables, never builds a physical plan, and never executes, so it can't catch a regression where planning succeeds but the resolved type or the executed output is wrong.