PERF: vectorize is_range_indexer and is_sequence_range scans by jbrockmendel · Pull Request #65922 · pandas-dev/pandas

jbrockmendel · 2026-06-21T16:40:53Z

Split out from GH-65298 (the has_sentinel half stays there).

4x-unrolls the inner scan loops of is_range_indexer and is_sequence_range in lib.pyx, combining lanes with bitwise | (not or) so the compiler can emit vectorized comparisons, with a scalar tail loop for the remainder. This mirrors the existing has_nans/all_nans unrolling (GH-65192).

has_infs was also unrolled in GH-65298 but is dropped here — benchmarking showed it flat (clang already vectorizes the float-compare loop, and it's memory-bound at scale), so the extra code wasn't earning its keep.

Benchmarks

Both the old (scalar) and new (unrolled) variants compiled into one extension with pandas' build flags (-O3 -std=c17 -DNDEBUG, baseline NEON on Apple Silicon), timed in-process, best-of-9. Common full-scan path (array is a range / sequence):

is_range_indexer (int64 — the no-op-indexer fast path in take/reset_index/merge/reindex):

n	old	new	speedup
1,000	642 ns	503 ns	1.28x
10,000	2.96 µs	1.75 µs	1.69x
100,000	26.3 µs	14.0 µs	1.87x
1,000,000	260 µs	139 µs	1.87x
10,000,000	2.63 ms	1.40 ms	1.87x

is_sequence_range (int64, step=3 — Index construction):

n	old	new	speedup
1,000	505 ns	379 ns	1.33x
10,000	2.94 µs	1.78 µs	1.65x
100,000	27.2 µs	15.7 µs	1.74x
1,000,000	272 µs	157 µs	1.74x
10,000,000	2.71 ms	1.56 ms	1.74x

Early-exit (short-circuit) path: 0.99–1.02x across all sizes for both — the unroll doesn't cost anything when the scan breaks early.

Test plan

Exhaustive correctness across int8/16/32/64, every break position, steps {1, 2, -1, 3}, plus length-mismatch / empty edge cases
pandas/tests/libs/test_lib.py + downstream suites (ranges, sort_values, reset_index, merge, strings)

No whatsnew entry, following the has_nans/all_nans precedent (GH-65192) — these are internal scan helpers and the per-op effect on end-user timings is below the noise floor.

🤖 Generated with Claude Code

4x-unroll the inner scan loops of is_range_indexer and is_sequence_range in lib.pyx, combining lanes with bitwise | (not `or`) so the compiler can emit vectorized comparisons, with a scalar tail for the remainder. Mirrors the existing has_nans/all_nans unrolling (pandas-devGH-65192). On the common full-scan path (array is a range / sequence) this is up to ~1.9x faster for is_range_indexer and ~1.7x for is_sequence_range at 100k+ elements, with no regression on the early-exit short-circuit path. Split out from pandas-devGH-65298. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jbrockmendel added the Performance Memory or execution speed performance label Jun 21, 2026

jbrockmendel mentioned this pull request Jun 21, 2026

PERF: short-circuit sentinel scans on integer indexers #65298

Closed

2 tasks

Merge branch 'main' into perf-unroll-2

d191e18

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PERF: vectorize is_range_indexer and is_sequence_range scans#65922

PERF: vectorize is_range_indexer and is_sequence_range scans#65922
jbrockmendel wants to merge 2 commits into
pandas-dev:mainfrom
jbrockmendel:perf-unroll-2

jbrockmendel commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jbrockmendel commented Jun 21, 2026

Benchmarks

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant