-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[10125] arrow-flight decode path optimizations #10206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Rich-T-kid
wants to merge
10
commits into
apache:main
Choose a base branch
from
Rich-T-kid:rich-T-kid/arrow-flight-decode-opt-impl
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+54
−13
Open
Changes from 4 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
080628d
allow arrow-flight users to skip validation in arrow-ipc decoding
Rich-T-kid c9f66a1
avoid extracting header bytes twice
Rich-T-kid 316b8bd
pre-allocate vectors
Rich-T-kid 0335ff4
re-align buffers if tonic passes up a mis-aligned buffer
Rich-T-kid a2c436b
make use of unsafe code clear to callers
Rich-T-kid 1878721
resolve FFI issues
Rich-T-kid 4053b0a
revert clone() to as_ref()
Rich-T-kid f2fcc0c
re-introduce .clone() under if condition
Rich-T-kid 2d22728
avoid public API changes
Rich-T-kid e7b3994
fix clippy errors
Rich-T-kid File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,9 +15,10 @@ | |
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| use crate::{FlightData, trailers::LazyTrailers, utils::flight_data_to_arrow_batch}; | ||
| use crate::{FlightData, trailers::LazyTrailers}; | ||
| use arrow_array::{ArrayRef, RecordBatch}; | ||
| use arrow_buffer::Buffer; | ||
| use arrow_buffer::{Buffer, MutableBuffer}; | ||
| use arrow_ipc::reader; | ||
| use arrow_schema::{Schema, SchemaRef}; | ||
| use bytes::Bytes; | ||
| use futures::{Stream, StreamExt, ready, stream::BoxStream}; | ||
|
|
@@ -228,6 +229,8 @@ pub struct FlightDataDecoder { | |
| state: Option<FlightStreamState>, | ||
| /// Seen the end of the inner stream? | ||
| done: bool, | ||
| /// Skip validation of decoded arrays (UTF-8, offset bounds, null counts). | ||
| skip_validation: bool, | ||
| } | ||
|
|
||
| impl Debug for FlightDataDecoder { | ||
|
|
@@ -236,6 +239,7 @@ impl Debug for FlightDataDecoder { | |
| .field("response", &"<stream>") | ||
| .field("state", &self.state) | ||
| .field("done", &self.done) | ||
| .field("skip_validation", &self.skip_validation) | ||
| .finish() | ||
| } | ||
| } | ||
|
|
@@ -250,9 +254,17 @@ impl FlightDataDecoder { | |
| state: None, | ||
| response: response.boxed(), | ||
| done: false, | ||
| skip_validation: false, | ||
| } | ||
| } | ||
|
|
||
| /// Only set for trusted senders, invalid data may cause undefined behavior. | ||
| /// Can improve performance by skipping validation | ||
| pub fn with_skip_validation(mut self, skip_validation: bool) -> Self { | ||
| self.skip_validation = skip_validation; | ||
| self | ||
| } | ||
|
|
||
| /// Returns the current schema for this stream | ||
| pub fn schema(&self) -> Option<&SchemaRef> { | ||
| self.state.as_ref().map(|state| &state.schema) | ||
|
|
@@ -319,14 +331,35 @@ impl FlightDataDecoder { | |
| )); | ||
| }; | ||
|
|
||
| let batch = flight_data_to_arrow_batch( | ||
| &data, | ||
| Arc::clone(&state.schema), | ||
| &state.dictionaries_by_field, | ||
| ) | ||
| .map_err(|e| { | ||
| FlightError::DecodeError(format!("Error decoding ipc RecordBatch: {e}")) | ||
| })?; | ||
| let data_buffer = if data.data_body.as_ptr() as usize % 64 != 0 { | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see context here |
||
| let mut buf = MutableBuffer::with_capacity(data.data_body.len()); | ||
| buf.extend_from_slice(&data.data_body); | ||
| Buffer::from(buf) | ||
| } else { | ||
| Buffer::from(data.data_body.clone()) | ||
| }; | ||
|
|
||
| let batch = message | ||
| .header_as_record_batch() | ||
| .ok_or_else(|| { | ||
| FlightError::DecodeError( | ||
| "Unable to convert flight data header to a record batch".to_string(), | ||
| ) | ||
| }) | ||
| .and_then(|record_batch| { | ||
| reader::read_record_batch( | ||
| &data_buffer, | ||
| record_batch, | ||
| Arc::clone(&state.schema), | ||
| &state.dictionaries_by_field, | ||
| None, | ||
| &message.version(), | ||
| self.skip_validation, | ||
| ) | ||
| .map_err(|e| { | ||
| FlightError::DecodeError(format!("Error decoding ipc RecordBatch: {e}")) | ||
| }) | ||
| })?; | ||
|
|
||
| Ok(Some(DecodedFlightData::new_record_batch(data, batch))) | ||
| } | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -45,7 +45,7 @@ pub fn flight_data_to_batches(flight_data: &[FlightData]) -> Result<Vec<RecordBa | |
| let mut batches = vec![]; | ||
| let dictionaries_by_id = HashMap::new(); | ||
| for datum in flight_data[1..].iter() { | ||
| let batch = flight_data_to_arrow_batch(datum, schema.clone(), &dictionaries_by_id)?; | ||
| let batch = flight_data_to_arrow_batch(datum, schema.clone(), &dictionaries_by_id, false)?; | ||
| batches.push(batch); | ||
| } | ||
| Ok(batches) | ||
|
|
@@ -56,6 +56,7 @@ pub fn flight_data_to_arrow_batch( | |
| data: &FlightData, | ||
| schema: SchemaRef, | ||
| dictionaries_by_id: &HashMap<i64, ArrayRef>, | ||
| skip_validation: bool, | ||
| ) -> Result<RecordBatch, ArrowError> { | ||
| // check that the data_header is a record batch message | ||
| let message = arrow_ipc::root_as_message(&data.data_header[..]) | ||
|
|
@@ -70,12 +71,13 @@ pub fn flight_data_to_arrow_batch( | |
| }) | ||
| .map(|batch| { | ||
| reader::read_record_batch( | ||
| &Buffer::from(data.data_body.as_ref()), | ||
| &Buffer::from(data.data_body.clone()), | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! I this is a sneaky one, but indeed this is avoiding a full clone |
||
| batch, | ||
| schema, | ||
| dictionaries_by_id, | ||
| None, | ||
| &message.version(), | ||
| skip_validation, | ||
| ) | ||
| })? | ||
| } | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than exposing this as a plan
boolflag, I think we should be requiring anUnsafeFlaghere.By requiring an
UnsafeFlag, we force consumers to explicitly have anunsafeblock in their codebase, making sure they are aware that what they are doing is not safe, and that they are responsible for ensuring memory safety there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me! pushed up a revision