Fix double max.free.per.topology cap in EvenScheduler idle rebalance#8798
Open
mwkang wants to merge 1 commit into
Open
Fix double max.free.per.topology cap in EvenScheduler idle rebalance#8798mwkang wants to merge 1 commit into
mwkang wants to merge 1 commit into
Conversation
DefaultScheduler.defaultSchedule ran the idle-supervisor redistribute twice per scheduling round: once over the full topology set at the top (the round-robin fairness pass), then again inside the per-leftover EvenScheduler.scheduleTopologiesEvenly call, which also redistributes on entry. The full-set pass consumes one idle supervisor, leaving any other idle supervisor for the per-topology pass to fill again, so the per-topology max.free.per.topology cap was applied twice. With the feature enabled, a positive cap and two or more idle supervisors, an under-assigned topology could relocate up to 2 * cap existing workers in a single round -- throttle-exceeding churn, not a crash or loss. Scoped to the DefaultScheduler (and IsolationScheduler leftover) path; EvenScheduler used directly schedules each topology once and is unaffected. The redistribute was added at both entry points by apache#8778 without a re-entry guard. Add a redistributeOntoIdle toggle to scheduleTopologiesEvenly: the public two-arg entry point keeps redistribute=true (EvenScheduler's own single full-set pass is unchanged), while DefaultScheduler runs the single full-set redistribute itself and delegates per leftover topology with the toggle off. The cap then binds once per round and the full-set round-robin fairness across topologies is preserved -- simply dropping the first call would break that fairness. The redistribute pass is gated off by default, so there is no behavior change for existing clusters. Add a regression test exercising the DefaultScheduler path with two idle supervisors, max.free.per.topology=1 and an under-assigned topology: before the fix it relocates two workers (one onto each idle supervisor), after the fix exactly one.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
DefaultScheduler.defaultScheduleruns the idle-supervisor redistribute pass twice per scheduling round — once over the full topology set (DefaultScheduler.java:75), then again inside the per-leftoverEvenScheduler.scheduleTopologiesEvenlycall (EvenScheduler.java:336).So the per-topology
nimbus.even.rebalance.max.free.per.topologycap is applied twice, and an under-assigned topology can relocate up to2 × capworkers in a single round.The fix adds a
redistributeOntoIdletoggle toscheduleTopologiesEvenly:EvenSchedulerdirectly is unchanged;DefaultSchedulerruns the single full-set redistribute itself and delegates per leftover topology with the toggle off.The cap now binds once per round, while the full-set round-robin fairness that lets multiple topologies share idle slots is preserved (simply dropping the full-set call would break it).
Follow-up to #8778. The feature is off by default, so there is no behavior change unless
nimbus.even.rebalance.idle.supervisor.enabled=true.Closes #8797.
How it was tested
Added
defaultSchedulerAppliesMaxFreeCapOncePerRound: the DefaultScheduler path with two idle supervisors,max.free.per.topology=1, and an under-assigned topology — it relocates two workers before the fix and exactly one after. ExistingTestEvenSchedulerIdleSupervisorand the other scheduler tests still pass.