fix(transaction): don't let user properties override computed summary metrics#2726
fix(transaction): don't let user properties override computed summary metrics#2726viirya wants to merge 2 commits into
Conversation
… metrics Building a snapshot summary merged the user's `snapshot_properties` on top of the computed metrics, so a caller could overwrite a metric key such as `added-data-files` with an arbitrary value, corrupting the summary. A non-integer value then panicked `update_totals`, which parsed the added/removed deltas with `.expect()`. This matches iceberg-java's behavior on both points (verified against `SnapshotProducer.java` and `SnapshotSummary.java` on `main`): - `SnapshotSummary.Builder.build()` adds custom properties first, then `metrics.addTo(builder)`, so computed metrics win on a key collision. We now apply `snapshot_properties` first and let the computed metrics overwrite them. - `SnapshotProducer.updateTotal` wraps parsing in try/catch and skips the total on a `NumberFormatException`. `update_totals` now tolerates an unparseable added/removed value by skipping that total instead of panicking. Closes apache#2725
dannycjones
left a comment
There was a problem hiding this comment.
one nit, a duplicated comment, for this change
otherwise LGTM
I think we should consider merging to get this in the 0.10.0 release?
| let added = summary | ||
| .additional_properties | ||
| .get(added_property) | ||
| .map_or(0, |value| { | ||
| value | ||
| .parse::<u64>() | ||
| .expect("must be parsable as it was just serialized") | ||
| }); | ||
| let removed = summary | ||
| .additional_properties | ||
| .get(removed_property) | ||
| .map_or(0, |value| { | ||
| value | ||
| .parse::<u64>() | ||
| .expect("must be parsable as it was just serialized") | ||
| }); | ||
| // Parse the added/removed deltas, tolerating an unparsable value by skipping | ||
| // the total entirely rather than panicking. Computed metrics always overwrite | ||
| // user-supplied summary properties (see `SnapshotProducer::summary`), so a bad | ||
| // value should only ever come from a previous snapshot's summary; matching | ||
| // iceberg-java's `updateTotal`, we ignore it instead of failing the commit. | ||
| let parse_delta = |property: &str| -> Option<u64> { | ||
| match summary.additional_properties.get(property) { | ||
| None => Some(0), | ||
| Some(value) => match value.parse::<u64>() { | ||
| Ok(v) => Some(v), | ||
| Err(parse_err) => { | ||
| tracing::warn!( | ||
| "Property '{property}' could not be parsed when computing '{total_property}': {parse_err}. \ | ||
| Skipping total computation.", | ||
| ); | ||
| None | ||
| } | ||
| }, | ||
| } | ||
| }; | ||
|
|
||
| let (Some(added), Some(removed)) = (parse_delta(added_property), parse_delta(removed_property)) | ||
| else { | ||
| return; | ||
| }; |
There was a problem hiding this comment.
My bad, I introduced this bug. I wasn't aware that users can overwrite the computed deltas.
The original intent was that at this point in the code, we should be the ones who wrote these values, hence the expect rather than parsing and error handling.
I think we're still open to bad values being passed in so avoiding expect sounds reasonable, although I wonder if we should be just avoiding the possibility of passing values matching totals/added/removed all together.
Related: one of the things I wanted to do is move total updates into the same part of the code as delta calculations - then we wouldn't need deserialize values we just wrote in the first place. Not for this PR though!
| // User-supplied snapshot properties are applied first, then the computed | ||
| // metrics overwrite any colliding keys. This matches iceberg-java | ||
| // (`SnapshotProducer.summary`), where computed `added-*`/`total-*` values | ||
| // are written after user properties so a user cannot shadow them with a | ||
| // bad (or merely wrong) value that would corrupt the snapshot summary. | ||
| // User-supplied snapshot properties are applied first, then the computed | ||
| // metrics overwrite any colliding keys. This matches iceberg-java | ||
| // (`SnapshotProducer.summary`), where computed `added-*`/`total-*` values | ||
| // are written after user properties so a user cannot shadow them with a | ||
| // bad (or merely wrong) value that would corrupt the snapshot summary. | ||
| let mut additional_properties = self.snapshot_properties.clone(); | ||
| additional_properties.extend(summary_collector.build()); |
There was a problem hiding this comment.
The comment here is duplicated, maybe a bad copy/paste or rebase.
| // User-supplied snapshot properties are applied first, then the computed | ||
| // metrics overwrite any colliding keys. This matches iceberg-java | ||
| // (`SnapshotProducer.summary`), where computed `added-*`/`total-*` values | ||
| // are written after user properties so a user cannot shadow them with a | ||
| // bad (or merely wrong) value that would corrupt the snapshot summary. | ||
| let mut additional_properties = self.snapshot_properties.clone(); | ||
| additional_properties.extend(summary_collector.build()); |
There was a problem hiding this comment.
It is still possible for a user to supply bad values, where the new snapshots value would be 0 as it won't be populated. Should we try and avoid this, for instance by dropping those values?
As a fix, I think this change is fine, but I am wondering if we need to follow-up here.
Which issue does this PR close?
What changes are included in this PR?
Building a snapshot summary let a user-supplied snapshot property overwrite a computed metric key, and a non-integer value then panicked total computation. Both are fixed to match iceberg-java.
1. Computed metrics now win over user properties (
transaction/snapshot.rs). Previously:Now the user properties are applied first and the computed metrics overwrite them:
This matches
SnapshotSummary.Builder.build()in iceberg-java, which doesbuilder.putAll(properties)(custom props) first and thenmetrics.addTo(builder), so computed metrics win on a key collision.2.
update_totalstolerates an unparseable delta instead of panicking (spec/snapshot_summary.rs). Theadded-*/removed-*parses used.expect("must be parsable as it was just serialized"). A non-integer value (which can also legitimately come from a previous snapshot's summary) panicked the commit. It now skips that total, matching iceberg-java'sSnapshotProducer.updateTotal, which wraps theLong.parseLongcalls intry { … } catch (NumberFormatException e) { // ignore and do not add total }.Both Java behaviors verified against
apache/icebergmain(core/src/main/java/org/apache/iceberg/SnapshotProducer.java,SnapshotSummary.java).Are these changes tested?
Two new regression tests:
transaction::append::test_snapshot_properties_cannot_override_computed_metrics— a fast-append whosesnapshot_propertiessetadded-data-files=9999andadded-records=<non-integer>commits without panicking, and the resulting summary reports the real computed counts (1), not the user values.spec::snapshot_summary::test_update_totals_tolerates_unparseable_added_value— an unparseableadded-*value skips that one total while sibling totals still compute, with no panic.Both fail if the respective fix is reverted (verified). Full
iceberglib suite (1373 tests) passes, clippy + rustfmt clean.