fix(deployment): render readiness probe for scheduler and controller-manager by Jakob3xD · Pull Request #1193 · clastix/kamaji

Jakob3xD · 2026-06-22T14:30:28Z

What

Render an actual readiness probe for kube-scheduler and kube-controller-manager, so that probes.scheduler.readiness, probes.controllerManager.readiness and the global probes.readiness stop being silently ignored for these two components.

Fixes #1192.

Why

The CRD schema exposes a readiness block under apiServer, controllerManager and scheduler (introduced in #1086), but the deployment builder only ever wired a readiness probe for kube-apiserver. For scheduler/CM the field was accepted by the API server and then had no effect — a silent no-op since it was introduced. This makes the documented configuration actually work.

How

internal/builders/controlplane/deployment.go:

Add a probe == nil guard to applyProbeOverrides (it now no-ops on a nil probe as well as a nil spec).
Introduce two helpers and rewire all three build* functions onto them:
- defaultProbe(path, port) — the standard HTTPS probe shared by every component (only path/port differ; all use 0/1/10/1/3).
- applyProbeSetOverrides(container, globalProbes, componentSet) — applies global overrides, then per-component overrides, to all three probe types.
Scheduler and controller-manager now get a ReadinessProbe on GET /healthz over HTTPS (port 10259 / 10257) — the same endpoint their existing liveness/startup probes already target.
The global probes.readiness now cascades to all three components (then per-component fields override), matching how liveness/startup already cascade.

The kube-apiserver path is a behaviour-preserving refactor: defaultProbe reproduces the previous apiserver probe literals byte-for-byte (/livez, /readyz, /livez on NetworkProfile.Port), guarded by a regression test.

No CRD schema change — the readiness fields were already exposed. The only API-package change is a godoc correction: ControlPlaneProbes.Readiness previously said "readiness probe of kube-apiserver", which is now inaccurate since the global field cascades to all three components; the comment and the regenerated CRD description (make manifests) are updated to match.

Behavioural consequence (please review)

After this change the control-plane pod is not Ready until kube-scheduler and kube-controller-manager /healthz also return 200, in addition to the apiserver. Two notes:

Service-endpoint coupling. All control-plane containers share one pod, and the kube-apiserver Service does not set publishNotReadyAddresses. So a sustained scheduler/CM /healthz failure (≈ FailureThreshold 3 × PeriodSeconds 10 = 30s) will now remove an otherwise-healthy kube-apiserver from its Service endpoints. This is the intended readiness semantics, but it is a real availability-semantics change — flagging it so maintainers can decide whether any publishNotReadyAddresses expectation needs adjusting. (Leader election is unaffected: scheduler/CM /healthz returns 200 on non-leaders, so standby replicas stay Ready.)
No fix: kube-scheduler and kube-controller-manager CrashLoopBackoff on every TCP creation delays cluster provisioning #1178 regression. Kubernetes does not run the readiness probe until the startup probe has succeeded, so initialDelaySeconds: 0 on readiness cannot reintroduce the startup race from fix: kube-scheduler and kube-controller-manager CrashLoopBackoff on every TCP creation delays cluster provisioning #1178.

Testing

go test ./internal/builders/controlplane/... — all specs pass, including new specs for scheduler/CM readiness rendering, the global-then-component override cascade, and a regression guard asserting the kube-apiserver probes are unchanged. go build ./... and go vet are clean.

…manager (clastix#1192)

netlify · 2026-06-22T14:30:34Z

✅ Deploy Preview for kamaji-documentation ready!

Name	Link
🔨 Latest commit	`14a2364`
🔍 Latest deploy log	https://app.netlify.com/projects/kamaji-documentation/deploys/6a3950e1859c8500080824c3
😎 Deploy Preview	https://deploy-preview-1193--kamaji-documentation.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

prometherion

Please commit make apidoc changes since we need to reflect the API changes in the documentation too.

…ents The global probes.readiness now cascades to scheduler and controller-manager as well as kube-apiserver, matching liveness/startup. Update the godoc and regenerate the CRD descriptions accordingly.

Jakob3xD added 2 commits June 22, 2026 16:29

fix(deployment): guard applyProbeOverrides against nil probe

0ee5f21

fix(deployment): render readiness probe for scheduler and controller-…

7f9332b

…manager (clastix#1192)

prometherion requested changes Jun 22, 2026

View reviewed changes

docs(api): readiness probe defaults apply to all control plane compon…

14a2364

…ents The global probes.readiness now cascades to scheduler and controller-manager as well as kube-apiserver, matching liveness/startup. Update the godoc and regenerate the CRD descriptions accordingly.

Jakob3xD force-pushed the fix/scheduler-cm-readiness-probe branch from 6c8a883 to 14a2364 Compare June 22, 2026 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deployment): render readiness probe for scheduler and controller-manager#1193

fix(deployment): render readiness probe for scheduler and controller-manager#1193
Jakob3xD wants to merge 3 commits into
clastix:masterfrom
Jakob3xD:fix/scheduler-cm-readiness-probe

Jakob3xD commented Jun 22, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

prometherion left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jakob3xD commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Behavioural consequence (please review)

Testing

Uh oh!

netlify Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kamaji-documentation ready!

Uh oh!

prometherion left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jakob3xD commented Jun 22, 2026 •

edited

Loading

netlify Bot commented Jun 22, 2026 •

edited

Loading