PERF: vectorize is_range_indexer and is_sequence_range scans#65922
Open
jbrockmendel wants to merge 2 commits into
Open
PERF: vectorize is_range_indexer and is_sequence_range scans#65922jbrockmendel wants to merge 2 commits into
jbrockmendel wants to merge 2 commits into
Conversation
4x-unroll the inner scan loops of is_range_indexer and is_sequence_range in lib.pyx, combining lanes with bitwise | (not `or`) so the compiler can emit vectorized comparisons, with a scalar tail for the remainder. Mirrors the existing has_nans/all_nans unrolling (pandas-devGH-65192). On the common full-scan path (array is a range / sequence) this is up to ~1.9x faster for is_range_indexer and ~1.7x for is_sequence_range at 100k+ elements, with no regression on the early-exit short-circuit path. Split out from pandas-devGH-65298. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2 tasks
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Split out from GH-65298 (the
has_sentinelhalf stays there).4x-unrolls the inner scan loops of
is_range_indexerandis_sequence_rangeinlib.pyx, combining lanes with bitwise|(notor) so the compiler can emit vectorized comparisons, with a scalar tail loop for the remainder. This mirrors the existinghas_nans/all_nansunrolling (GH-65192).has_infswas also unrolled in GH-65298 but is dropped here — benchmarking showed it flat (clang already vectorizes the float-compare loop, and it's memory-bound at scale), so the extra code wasn't earning its keep.Benchmarks
Both the old (scalar) and new (unrolled) variants compiled into one extension with pandas' build flags (
-O3 -std=c17 -DNDEBUG, baseline NEON on Apple Silicon), timed in-process, best-of-9. Common full-scan path (array is a range / sequence):is_range_indexer(int64 — the no-op-indexer fast path intake/reset_index/merge/reindex):is_sequence_range(int64, step=3 — Index construction):Early-exit (short-circuit) path: 0.99–1.02x across all sizes for both — the unroll doesn't cost anything when the scan breaks early.
Test plan
{1, 2, -1, 3}, plus length-mismatch / empty edge casespandas/tests/libs/test_lib.py+ downstream suites (ranges, sort_values, reset_index, merge, strings)No whatsnew entry, following the
has_nans/all_nansprecedent (GH-65192) — these are internal scan helpers and the per-op effect on end-user timings is below the noise floor.🤖 Generated with Claude Code