Skip to content

fix: seal FromBytes trait to prevent unsound external implementations (#10164)#10166

Draft
sandy-sachin7 wants to merge 1 commit into
apache:mainfrom
sandy-sachin7:fix/seal-frombytes-trait
Draft

fix: seal FromBytes trait to prevent unsound external implementations (#10164)#10166
sandy-sachin7 wants to merge 1 commit into
apache:mainfrom
sandy-sachin7:fix/seal-frombytes-trait

Conversation

@sandy-sachin7

Copy link
Copy Markdown

Which issue does this PR close?

Closes #10164.

Rationale for this change

FromBytes is a pub unsafe trait whose # Safety contract only documented bit-pattern validity. Internally, BitReader::get_batch fast-paths cast *mut T to *mut u16/u32/u64 and call std::slice::from_raw_parts_mut, which requires the pointer to be at least as aligned as the target primitive type. Because the documented contract did not mention alignment, a downstream crate could soundly implement FromBytes for a #[repr(packed)] type of size 2 (align 1) and trigger undefined behavior when passed to get_batch.

What changes are included in this PR?

  1. Sealed the trait: Added private::Sealed as a supertrait of FromBytes, preventing external implementations entirely — matching the existing pattern used by ParquetValueType in data_type.rs.

  2. Updated the # Safety contract: Now documents both the bit-pattern and alignment requirements.

  3. impl private::Sealed for every type implementing FromBytes: u8u64, i8i64, f32, f64, bool, Int96, ByteArray, FixedLenByteArray.

Are these changes tested?

All 22 existing bit_util tests pass. The change is purely structural — existing implementations are unaffected. External code that attempted to implement FromBytes (unlikely — none found in the wild) will get a compile error about the sealed supertrait, which is the desired outcome.

Are there any user-facing changes?

No API changes for existing users. External implementations of FromBytes or FromBitpacked are no longer possible (but were not intended to be extensible — the trait was only pub for visibility, not for extension).

…external implementations (apache#10164)

Seal the public unsafe trait `FromBytes` (and by extension `FromBitpacked`)
with a private supertrait so that it cannot be implemented outside the
`parquet` crate.

```rust
pub unsafe trait FromBytes: Sized + private::Sealed { ... }
```

This is necessary because `BitReader::get_batch` casts `*mut T` pointers to
`*mut u16`/u32/u64 and calls `std::slice::from_raw_parts_mut` on them,
which requires the pointer to be aligned for those primitive types.
The old `# Safety` contract only mentioned bit-pattern validity and did
not document the alignment requirement, so a downstream crate could
soundly implement `FromBytes` for a `#[repr(packed)]` type with size 2
but align 1 and trigger undefined behavior.

All in-crate implementations (`u8`-`u64`, `i8`-`i64`, `f32`, `f64`,
`bool`, `Int96`, `ByteArray`, `FixedLenByteArray`) already satisfy the
alignment requirement because they all have natural alignment matching
their size.
@alamb

alamb commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

alamb pushed a commit that referenced this pull request Jun 23, 2026
… in BitReader::get_batch (#10172)

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes #9324

# Rationale for this change

The `FromBitpacked` trait was introduced in #9662 as a marker for types
that could be decoded from bitpacked data, but the implementation still
mostly used methods from its `FromBytes` super-trait. The implementation
of `BitReader::get_batch` that dispatched on the size of the type then
required a complex safety contract, which meant that `FromBytes` had to
be an `unsafe` trait.

Instead of that size dispatch, we can move the unsafe code into a method
of the `FromBitpacked` trait, which makes `BitReader::get_batch`
completely safe. This also removes the need to make any of those traits
sealed as proposed in #10166.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

# What changes are included in this PR?

The responsibilities of the `FromBytes` and `FromBitpacked` traits
should be much clearer now:

 - `FromBytes` is used for plain encoded data
 - `FromBitpacked` is used for bitpacked data

Unsafe code is moved from `BitReader::get_batch` into some of the
`FromBitpacked` implementations.

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

# Are these changes tested?

All existing tests pass.

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?

If this PR claims a performance improvement, please include evidence
such as benchmark results.
-->

# Are there any user-facing changes?

`FromBitpacked` gains a new method and is no longer a sub-trait of
`FromBytes`, so this might be breaking change. Code that previously
implemented `FromBytes` will need to change to remove the `unsafe`
keyword and `BIT_CAPACITY` constant.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->

---------

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
@alamb

alamb commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Given we have gone with

I am not sure this one is needed anymore

@sandy-sachin7 what do you think?

@alamb alamb marked this pull request as draft June 23, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Soundness: Unsound alignment contract in public FromBytes trait and BitReader::get_batch

2 participants