fix: seal FromBytes trait to prevent unsound external implementations (#10164)#10166
Draft
sandy-sachin7 wants to merge 1 commit into
Draft
fix: seal FromBytes trait to prevent unsound external implementations (#10164)#10166sandy-sachin7 wants to merge 1 commit into
sandy-sachin7 wants to merge 1 commit into
Conversation
…external implementations (apache#10164) Seal the public unsafe trait `FromBytes` (and by extension `FromBitpacked`) with a private supertrait so that it cannot be implemented outside the `parquet` crate. ```rust pub unsafe trait FromBytes: Sized + private::Sealed { ... } ``` This is necessary because `BitReader::get_batch` casts `*mut T` pointers to `*mut u16`/u32/u64 and calls `std::slice::from_raw_parts_mut` on them, which requires the pointer to be aligned for those primitive types. The old `# Safety` contract only mentioned bit-pattern validity and did not document the alignment requirement, so a downstream crate could soundly implement `FromBytes` for a `#[repr(packed)]` type with size 2 but align 1 and trigger undefined behavior. All in-crate implementations (`u8`-`u64`, `i8`-`i64`, `f32`, `f64`, `bool`, `Int96`, `ByteArray`, `FixedLenByteArray`) already satisfy the alignment requirement because they all have natural alignment matching their size.
Contributor
alamb
pushed a commit
that referenced
this pull request
Jun 23, 2026
… in BitReader::get_batch (#10172) # Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - Closes #9324 # Rationale for this change The `FromBitpacked` trait was introduced in #9662 as a marker for types that could be decoded from bitpacked data, but the implementation still mostly used methods from its `FromBytes` super-trait. The implementation of `BitReader::get_batch` that dispatched on the size of the type then required a complex safety contract, which meant that `FromBytes` had to be an `unsafe` trait. Instead of that size dispatch, we can move the unsafe code into a method of the `FromBitpacked` trait, which makes `BitReader::get_batch` completely safe. This also removes the need to make any of those traits sealed as proposed in #10166. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> # What changes are included in this PR? The responsibilities of the `FromBytes` and `FromBitpacked` traits should be much clearer now: - `FromBytes` is used for plain encoded data - `FromBitpacked` is used for bitpacked data Unsafe code is moved from `BitReader::get_batch` into some of the `FromBitpacked` implementations. <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> # Are these changes tested? All existing tests pass. <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? If this PR claims a performance improvement, please include evidence such as benchmark results. --> # Are there any user-facing changes? `FromBitpacked` gains a new method and is no longer a sub-trait of `FromBytes`, so this might be breaking change. Code that previously implemented `FromBytes` will need to change to remove the `unsafe` keyword and `BIT_CAPACITY` constant. <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. --> --------- Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
Contributor
|
Given we have gone with
I am not sure this one is needed anymore @sandy-sachin7 what do you think? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #10164.
Rationale for this change
FromBytesis apub unsafe traitwhose# Safetycontract only documented bit-pattern validity. Internally,BitReader::get_batchfast-paths cast*mut Tto*mut u16/u32/u64 and callstd::slice::from_raw_parts_mut, which requires the pointer to be at least as aligned as the target primitive type. Because the documented contract did not mention alignment, a downstream crate could soundly implementFromBytesfor a#[repr(packed)]type of size 2 (align 1) and trigger undefined behavior when passed toget_batch.What changes are included in this PR?
Sealed the trait: Added
private::Sealedas a supertrait ofFromBytes, preventing external implementations entirely — matching the existing pattern used byParquetValueTypeindata_type.rs.Updated the
# Safetycontract: Now documents both the bit-pattern and alignment requirements.impl private::Sealedfor every type implementingFromBytes:u8–u64,i8–i64,f32,f64,bool,Int96,ByteArray,FixedLenByteArray.Are these changes tested?
All 22 existing
bit_utiltests pass. The change is purely structural — existing implementations are unaffected. External code that attempted to implementFromBytes(unlikely — none found in the wild) will get a compile error about the sealed supertrait, which is the desired outcome.Are there any user-facing changes?
No API changes for existing users. External implementations of
FromBytesorFromBitpackedare no longer possible (but were not intended to be extensible — the trait was onlypubfor visibility, not for extension).