Skip to content

PERF: add lib.has_sentinel for integer-indexer sentinel scans#65923

Open
jbrockmendel wants to merge 2 commits into
pandas-dev:mainfrom
jbrockmendel:perf-sentinels
Open

PERF: add lib.has_sentinel for integer-indexer sentinel scans#65923
jbrockmendel wants to merge 2 commits into
pandas-dev:mainfrom
jbrockmendel:perf-sentinels

Conversation

@jbrockmendel

Copy link
Copy Markdown
Member

Split out from GH-65298. The other piece — unrolling the pre-existing has_infs / is_range_indexer / is_sequence_range helpers — is independent and will be its own PR.

Summary

Adds lib.has_sentinel(arr, sentinel), a fused-type (int8/16/32/64) Cython helper that is a short-circuiting, allocation-free equivalent of (arr == sentinel).any() on integer indexers, with an 8x-unrolled inner loop. Wires up the clear (indexer == -1).any() / (ilocs < 0).any() call sites:

  • _MergeOperation._maybe_add_join_keys (non-inner merges)
  • _Unstacker.new_index (single-level unstack)
  • sorting.get_group_index / decons_obs_group_ids (key lifting)
  • MultiIndex._get_indexer_strict NaN-key path
  • DataFrame.__setitem__ non-unique-columns path

Benchmarks

Micro, has_sentinel(arr, -1) vs (arr == -1).any(), int64, best-of-9:

n sentinel pos (==-1).any() has_sentinel speedup
1 000 first / absent 1.1 µs 0.26 / 0.38 µs 4.3x / 3.0x
100 000 first 8.9 µs 0.25 µs 35.9x
100 000 absent 9.9 µs 11.5 µs 0.9x
1 000 000 first 78 µs 0.23 µs 338x
1 000 000 absent 88 µs 115 µs 0.8x

The gains come from short-circuiting (large when a sentinel sits early) and from never allocating the N-element boolean array (dominant for small / overhead-bound arrays). On a large full scan with no early match the 8-wide scalar unroll trails numpy's SIMD comparison — most visible on narrow dtypes.

End-to-end this is roughly neutral on the operations that call it: each calls the helper O(1) times, so a single scan is a negligible fraction of the operation. No whatsnew entry for that reason. (Plain GroupBy aggregations don't reach the helper — they go through get_group_index(..., xnull=True).)

Test plan

  • pandas/tests/libs/test_lib.py: correctness across int8/16/32/64, every-position coverage around the unroll boundary, and empty input
  • Call-site suites (reshape/merge, frame/indexing/test_setitem, indexes/multi/test_indexing, test_sorting, frame/test_stack_unstack, reshape/test_crosstab, groupby/) — all green

Add a fused-type (int8/16/32/64) Cython helper has_sentinel(arr, sentinel),
a short-circuiting, allocation-free equivalent of (arr == sentinel).any()
with an 8x-unrolled inner loop, and wire up the clear (indexer == -1).any()
call sites in merge (non-inner), single-level unstack, groupby key lifting,
MultiIndex NaN-key indexing, and DataFrame.__setitem__ with non-unique
columns.

Split out from GH#65298. The unrolling of the pre-existing has_infs /
is_range_indexer / is_sequence_range helpers is a separate, independent PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Jun 21, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant