feat(tesseract): Support boolean and numeric filter values#11135
Conversation
Tesseract deserialized filter `values` as `Vec<Option<String>>`, so a
boolean (or numeric) value in a query filter — e.g.
`{operator: 'equals', values: [true]}` — failed with
"invalid type: boolean `true`, expected a string". The legacy JS planner
never hit this since JS is untyped.
Introduce a typed `FilterValue { Str, Bool, Num, Null }` enum and thread it
end-to-end (deserialization → TypedFilter → scalar operators → param
rendering). Values are normalized to their canonical string form only at
the `allocate_param` boundary, so the param channel stays `Vec<String>`.
|
Claude finished @ovr's task in 2m 46s —— View job Review — feat(tesseract): typed
|
| # | Sev | File:Line | Concern |
|---|---|---|---|
| 1 | medium | planner/query_tools.rs:201 |
unwrap_or_default() silently maps FilterValue::Null allocated params to "" instead of preserving SQL NULL |
| 2 | low | cubenativeutils/.../deserializer.rs:42-50 |
deserialize_any int/float split affects all native deserialization, not just FilterValue — worth a dedicated unit test |
| 3 | low | base_query_options.rs:32-43 |
format_number cutoff < 1e15 is non-standard; Display for f64 won't match JS stringification exactly |
| 4 | low | filter/filter_debug.rs:12-17 |
Debug output now wraps booleans/numbers in single quotes, looking like strings |
| 5 | low | (test coverage) | No new tests exercise the actual motivating case (values: [true] / [42]) end-to-end or FilterValue serde round-trips |
Other observations
FilterValue::to_param_stringclones theStringfor theStrvariant on every call — minor perf nit on hot paths (allocate_and_cast_values);Cow<str>would be a free upgrade later but not worth blocking on.- Replacing the
todo!()indeserialize_enumwith a clear error is a nice incidental fix. first_non_null_stringcorrectly delegates tofirst_non_null_valuethento_param_string; the trailingok_or_elseis unreachable (the value was already verified non-null) but harmless.- Test fixtures and
change_operatoruse_raw_values tests were updated consistently; no diff drift spotted.
| } else if let Ok(val) = self.input.to_number() { | ||
| visitor.visit_i64(val.value().unwrap() as i64) //We deserialize float value in | ||
| //different methods | ||
| let num = val.value().unwrap(); | ||
| // Preserve fractional numbers as floats; only integral values are | ||
| // narrowed to i64 (whole-number JS values are the common case, and | ||
| // self-describing consumers like FilterValue expect them as ints). | ||
| if num.fract() == 0.0 && num.is_finite() { | ||
| visitor.visit_i64(num as i64) | ||
| } else { | ||
| visitor.visit_f64(num) |
There was a problem hiding this comment.
This narrowing applies to every call to the native deserialize_any (not just FilterValue). Previously every JS number was forwarded as i64 and serde coerced it to whatever the target type expected; now integral values go to visit_i64 and fractional values to visit_f64.
Two latent risks worth a quick check:
- Any existing struct field deserialized via this path that was previously receiving truncated integer values from fractional JS numbers will now see the original
f64. That's strictly an improvement for correct callers but could surface previously-silent data loss in callers that targetedi64/u64and relied on the truncation. f64::NANis!is_finite(), so it goes tovisit_f64(NaN)— most serde targets reject this, which is probably desired but worth noting.
Adding a unit test in deserializer.rs covering the four shapes (int, float, NaN, very large integral) would lock the contract down.
|
|
||
| pub fn as_str(&self) -> Option<&str> { | ||
| match self { | ||
| FilterValue::Str(s) => Some(s.as_str()), | ||
| _ => None, | ||
| } | ||
| } | ||
|
|
||
| /// Canonical string representation bound as a SQL parameter. `Null` yields | ||
| /// `None` (the value is dropped / handled as `IS NULL`). Whole numbers are | ||
| /// rendered without a trailing `.0` (`42.0` → `"42"`). | ||
| pub fn to_param_string(&self) -> Option<String> { |
There was a problem hiding this comment.
Two small things on format_number:
format!("{}", n as i64)for the integral fast path silently saturates ifnis outsidei64::MIN..=i64::MAX. Then.abs() < 1e15guard keeps us well inside that range — good — but it also excludes values like1e15itself even though they're exactly representable and fit ini64. Usingn.abs() <= (1_i64 << 53) as f64is the conventional cutoff (the largest f64 with unique integer round-trip) and matches what JSNumber.isSafeIntegerenforces upstream.- For the fallback
format!("{}", n), the defaultDisplayforf64uses minimum digits for round-tripping. That's fine for most values but won't match how the JS side originally stringified the number (e.g. JS may emit1e-7while Rust emits0.0000001). If anything downstream compares the rendered param against an exact string, this can diverge. Probably out-of-scope for this PR, but worth a comment noting the format is "Rust default, not JS-compatible".
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #11135 +/- ##
===========================================
+ Coverage 58.74% 79.12% +20.37%
===========================================
Files 217 473 +256
Lines 17495 96514 +79019
Branches 3558 3558
===========================================
+ Hits 10278 76364 +66086
- Misses 6699 19632 +12933
Partials 518 518
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Claude finished @ovr's task in 1m 9s —— View job Re-review —
|
| Area | Change |
|---|---|
params_allocator.rs |
params: Vec<FilterValue>; build_sql_and_params returns Vec<FilterValue> |
query_tools.rs |
get_allocated_params returns Vec<FilterValue>; no more unwrap_or_default() collapse |
physical_plan/filter/operators/like.rs |
Uses new allocate_and_cast_str helper — avoids the String → FilterValue::Str → param_string round-trip |
physical_plan/filter/operators/filter_sql_context.rs |
New allocate_and_cast_str(value: &str, …) factored out of allocate_and_cast_value |
planner/filter/typed_filter.rs |
first_non_null_string documents the unreachable fallback; switches to unwrap_or_default() with a comment |
test_fixtures/test_utils/test_context.rs |
inline_params now inlines Null as bare NULL (rather than '') — correct |
tests/filter/mod.rs |
assert_filter renders Null as the literal string "NULL" for comparison |
base_query_options.rs |
Removed unused as_str() |
Threads status
| Thread | Status |
|---|---|
1. NULL-safety in build_sql_and_params (medium) |
Resolved — fix landed in 0ee60a3 |
2. deserialize_any int/float split (low) |
Still applies — no unit test added |
3. format_number integral cutoff / JS-compat (low) |
Still applies — unchanged |
4. filter_debug.rs quoting bool/num (low) |
Still applies — unchanged |
5. No end-to-end tests for values: [true] / [42] (low) |
Still applies — no new tests on the motivating case |
Notes on the fix shape
- The new
Vec<FilterValue>param channel is the right move — it keeps the type info all the way to the driver boundary, so the driver can choose betweenIS NULLand boundNULLrendering itself instead of receiving a stringly-typed sentinel. first_non_null_string'sunwrap_or_default()is now annotated as unreachable — fair, sincefirst_non_null_valuefilters out nulls and non-nullFilterValue::to_param_stringalways returnsSome. Anexpect("non-null FilterValue renders to Some")would surface invariant violations more loudly thanunwrap_or_default(), but this is style preference, not a defect.- The new
allocate_and_cast_stris a clean factoring; the previousLikeOpround-trip throughFilterValue::Str(v.clone())only to immediately callto_param_string()is gone.
0ee60a3 to
5f72289
Compare
|
Claude finished @ovr's task in 1m 17s —— View job Re-review —
|
| Thread | Status |
|---|---|
1. NULL-safety in build_sql_and_params (medium) |
Resolved — fix landed; unit-test coverage added in this commit |
2. deserialize_any int/float split (low) |
Still applies — no test added for deserializer.rs |
3. format_number integral cutoff / JS-compat (low) |
Still applies — unchanged |
4. filter_debug.rs quoting bool/num (low) |
Still applies — unchanged |
5. No end-to-end tests for values: [true] / [42] (low) |
Partially addressed at the unit level (param-channel boundary). A higher-level test that builds a FilterItem with a boolean/numeric value and asserts rendered SQL + params would still be valuable for the motivating case |
Notes
native_params_keep_natural_type_including_nullasserts the exact behavior that previously regressed (Nullarriving as"") — exactly the right pin.allocate_and_cast_strfactoring infilter_sql_context.rsis retained andLikeOpuses it; the priorString → FilterValue::Str → param_stringround-trip is gone.first_non_null_string's comment +unwrap_or_default()continues to be a stylistic choice rather than a defect;expect(...)would surface invariant violations more loudly, but this is fine as-is.
|
Claude finished @ovr's task in 41s —— View job Re-review —
|
| Thread | Status |
|---|---|
1. NULL-safety in build_sql_and_params (medium) |
Resolved earlier in 0ee60a3 |
2. deserialize_any int/float split test (low) |
Still applies — no test added in deserializer.rs |
3. format_number integral cutoff / JS-compat (low) |
Still applies — unchanged |
4. filter_debug.rs quoting bool/num (low) |
Resolved in 527a8c9 — to_debug_string + tests |
5. End-to-end values: [true] / [42] test (low) |
Partially addressed at the debug-render level; a FilterItem → SQL+params round-trip test would still be valuable |
FilterValue::to_debug_string and to_param_string are close cousins; consider whether the param-rendering path could share the integral/float formatting logic (currently only to_debug_string reuses format_number). Minor — not blocking.
|
Claude finished @ovr's task in 47s —— View job Re-review —
|
| Thread | Status |
|---|---|
1. NULL-safety in build_sql_and_params (medium) |
Resolved in 0ee60a3 |
2. deserialize_any int/float split test in deserializer.rs (low) |
Still applies — the new tests live at the FilterValue serde layer, not the native deserialize_any path. NaN / very-large-integral coverage in cubenativeutils/src/wrappers/serializer/deserializer.rs would still lock that contract |
3. format_number integral cutoff / JS-compat (low) |
Still applies — unchanged |
4. filter_debug.rs quoting bool/num (low) |
Resolved in 527a8c9 |
5. End-to-end values: [true] / [42] test (low) |
Resolved in a39a58b — serde tests in base_query_options.rs + 7 SQL render tests in to_sql.rs |
Notes
test_not_equals_boolean_nativecorrectly verifies the null-safe<> ... OR ... IS NULLrendering fornotEqualson a boolean column — good coverage of an easy-to-miss edge.to_param_string_renders_canonical_formincludes the nice contract pin:FilterValue::Num(42.0).to_param_string() == Some("42")(not"42.0") — confirms theformat_numberintegral-fast-path is exercised. Thread ResultSet pivot methods documentation #3's cutoff concern would now be straightforward to add a unit case for if desired.- Tests at both layers (serde + rendered SQL) is the right shape — catches regressions either above or below the typed-filter boundary.
Tesseract deserialized filter
valuesasVec<Option<String>>, so a boolean (or numeric) value in a query filter — e.g.{operator: 'equals', values: [true]}— failed with "invalid type: booleantrue, expected a string". The legacy JS planner never hit this since JS is untyped.Introduce a typed
FilterValue { Str, Bool, Num, Null }enum and thread it end-to-end (deserialization → TypedFilter → scalar operators → param rendering). Values are normalized to their canonical string form only at theallocate_paramboundary, so the param channel staysVec<String>.