Fix double max.free.per.topology cap in EvenScheduler idle rebalance by mwkang · Pull Request #8798 · apache/storm

mwkang · 2026-06-22T03:50:17Z

What this PR does

DefaultScheduler.defaultSchedule runs the idle-supervisor redistribute pass twice per scheduling round — once over the full topology set (DefaultScheduler.java:75), then again inside the per-leftover EvenScheduler.scheduleTopologiesEvenly call (EvenScheduler.java:336).
So the per-topology nimbus.even.rebalance.max.free.per.topology cap is applied twice, and an under-assigned topology can relocate up to 2 × cap workers in a single round.

The fix adds a redistributeOntoIdle toggle to scheduleTopologiesEvenly:

the public 2-arg entry point keeps redistribute on, so using EvenScheduler directly is unchanged;
DefaultScheduler runs the single full-set redistribute itself and delegates per leftover topology with the toggle off.

The cap now binds once per round, while the full-set round-robin fairness that lets multiple topologies share idle slots is preserved (simply dropping the full-set call would break it).

Follow-up to #8778. The feature is off by default, so there is no behavior change unless nimbus.even.rebalance.idle.supervisor.enabled=true.

Closes #8797.

How it was tested

Added defaultSchedulerAppliesMaxFreeCapOncePerRound: the DefaultScheduler path with two idle supervisors, max.free.per.topology=1, and an under-assigned topology — it relocates two workers before the fix and exactly one after. Existing TestEvenSchedulerIdleSupervisor and the other scheduler tests still pass.

DefaultScheduler.defaultSchedule ran the idle-supervisor redistribute twice per scheduling round: once over the full topology set at the top (the round-robin fairness pass), then again inside the per-leftover EvenScheduler.scheduleTopologiesEvenly call, which also redistributes on entry. The full-set pass consumes one idle supervisor, leaving any other idle supervisor for the per-topology pass to fill again, so the per-topology max.free.per.topology cap was applied twice. With the feature enabled, a positive cap and two or more idle supervisors, an under-assigned topology could relocate up to 2 * cap existing workers in a single round -- throttle-exceeding churn, not a crash or loss. Scoped to the DefaultScheduler (and IsolationScheduler leftover) path; EvenScheduler used directly schedules each topology once and is unaffected. The redistribute was added at both entry points by apache#8778 without a re-entry guard. Add a redistributeOntoIdle toggle to scheduleTopologiesEvenly: the public two-arg entry point keeps redistribute=true (EvenScheduler's own single full-set pass is unchanged), while DefaultScheduler runs the single full-set redistribute itself and delegates per leftover topology with the toggle off. The cap then binds once per round and the full-set round-robin fairness across topologies is preserved -- simply dropping the first call would break that fairness. The redistribute pass is gated off by default, so there is no behavior change for existing clusters. Add a regression test exercising the DefaultScheduler path with two idle supervisors, max.free.per.topology=1 and an under-assigned topology: before the fix it relocates two workers (one onto each idle supervisor), after the fix exactly one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix double max.free.per.topology cap in EvenScheduler idle rebalance#8798

Fix double max.free.per.topology cap in EvenScheduler idle rebalance#8798
mwkang wants to merge 1 commit into
apache:masterfrom
mwkang:8797-even-rebalance-double-cap

mwkang commented Jun 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mwkang commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

How it was tested

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mwkang commented Jun 22, 2026 •

edited

Loading