Improve performance of binary and string concatenation operator#23057
Improve performance of binary and string concatenation operator#23057pepijnve wants to merge 3 commits into
Conversation
comphead
left a comment
There was a problem hiding this comment.
Thanks @pepijnve
Lets wait apache/arrow-rs#10161 to be merged, the PR might get some changes down the road and we can port changes here.
It would be useful to have some TODO to use the arrow-rs code instead, once it is released
|
I'll mark this one as draft for now and add the TODO already. |
|
Marking this for consideration again since the arrow-rs PR has been merged. @alamb since you reviewed the arrow-rs work, wdyt? Is it worth temporarily porting this to DataFusion or do we wait for the next arrow-rs release? |
I think the next arrow-release will come out before the next Datafusion release, so I don't think we need to add the same optimization in both places |
|
Closing this one then |
|
And just as I pressed the button, I thought it might be useful to keep track of the fact that we should replace the custom DataFusion kernel with the arrow-rw one. I'll make an issue to track that. |
Bonus points if you make the issue and then make a PR to add the refernce to the code as comment (so fugure coders can easily find the todo) |
|
Issue logged, PR incoming #23210 |
Which issue does this PR close?
Rationale for this change
During recent profiling work string concatenation proved to be a hotspot. Investigation of the current kernel implementation for string views showed that there was still some room for improvement.
Together this can result in ~30% improvement per the string concat benchmark
Note that this work is a port of apache/arrow-rs#10161. Ideally the implementation from Arrow is used by DataFusion once the PR in that project is merged and released. Since DataFusion currently uses a custom kernel it seemed to make sense to temporarily port the proposed PR from Arrow.
What changes are included in this PR?
Are these changes tested?
Covered by existing tests
Are there any user-facing changes?
No