Skip to content

fix(deployment): render readiness probe for scheduler and controller-manager#1193

Open
Jakob3xD wants to merge 3 commits into
clastix:masterfrom
Jakob3xD:fix/scheduler-cm-readiness-probe
Open

fix(deployment): render readiness probe for scheduler and controller-manager#1193
Jakob3xD wants to merge 3 commits into
clastix:masterfrom
Jakob3xD:fix/scheduler-cm-readiness-probe

Conversation

@Jakob3xD

@Jakob3xD Jakob3xD commented Jun 22, 2026

Copy link
Copy Markdown

What

Render an actual readiness probe for kube-scheduler and kube-controller-manager, so that probes.scheduler.readiness, probes.controllerManager.readiness and the global probes.readiness stop being silently ignored for these two components.

Fixes #1192.

Why

The CRD schema exposes a readiness block under apiServer, controllerManager and scheduler (introduced in #1086), but the deployment builder only ever wired a readiness probe for kube-apiserver. For scheduler/CM the field was accepted by the API server and then had no effect — a silent no-op since it was introduced. This makes the documented configuration actually work.

How

internal/builders/controlplane/deployment.go:

  • Add a probe == nil guard to applyProbeOverrides (it now no-ops on a nil probe as well as a nil spec).
  • Introduce two helpers and rewire all three build* functions onto them:
    • defaultProbe(path, port) — the standard HTTPS probe shared by every component (only path/port differ; all use 0/1/10/1/3).
    • applyProbeSetOverrides(container, globalProbes, componentSet) — applies global overrides, then per-component overrides, to all three probe types.
  • Scheduler and controller-manager now get a ReadinessProbe on GET /healthz over HTTPS (port 10259 / 10257) — the same endpoint their existing liveness/startup probes already target.
  • The global probes.readiness now cascades to all three components (then per-component fields override), matching how liveness/startup already cascade.

The kube-apiserver path is a behaviour-preserving refactor: defaultProbe reproduces the previous apiserver probe literals byte-for-byte (/livez, /readyz, /livez on NetworkProfile.Port), guarded by a regression test.

No CRD schema change — the readiness fields were already exposed. The only API-package change is a godoc correction: ControlPlaneProbes.Readiness previously said "readiness probe of kube-apiserver", which is now inaccurate since the global field cascades to all three components; the comment and the regenerated CRD description (make manifests) are updated to match.

Behavioural consequence (please review)

After this change the control-plane pod is not Ready until kube-scheduler and kube-controller-manager /healthz also return 200, in addition to the apiserver. Two notes:

  1. Service-endpoint coupling. All control-plane containers share one pod, and the kube-apiserver Service does not set publishNotReadyAddresses. So a sustained scheduler/CM /healthz failure (≈ FailureThreshold 3 × PeriodSeconds 10 = 30s) will now remove an otherwise-healthy kube-apiserver from its Service endpoints. This is the intended readiness semantics, but it is a real availability-semantics change — flagging it so maintainers can decide whether any publishNotReadyAddresses expectation needs adjusting. (Leader election is unaffected: scheduler/CM /healthz returns 200 on non-leaders, so standby replicas stay Ready.)
  2. No fix: kube-scheduler and kube-controller-manager CrashLoopBackoff on every TCP creation delays cluster provisioning #1178 regression. Kubernetes does not run the readiness probe until the startup probe has succeeded, so initialDelaySeconds: 0 on readiness cannot reintroduce the startup race from fix: kube-scheduler and kube-controller-manager CrashLoopBackoff on every TCP creation delays cluster provisioning #1178.

Testing

go test ./internal/builders/controlplane/... — all specs pass, including new specs for scheduler/CM readiness rendering, the global-then-component override cascade, and a regression guard asserting the kube-apiserver probes are unchanged. go build ./... and go vet are clean.

@netlify

netlify Bot commented Jun 22, 2026

Copy link
Copy Markdown

Deploy Preview for kamaji-documentation ready!

Name Link
🔨 Latest commit 14a2364
🔍 Latest deploy log https://app.netlify.com/projects/kamaji-documentation/deploys/6a3950e1859c8500080824c3
😎 Deploy Preview https://deploy-preview-1193--kamaji-documentation.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@prometherion prometherion left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please commit make apidoc changes since we need to reflect the API changes in the documentation too.

…ents

The global probes.readiness now cascades to scheduler and controller-manager
as well as kube-apiserver, matching liveness/startup. Update the godoc and
regenerate the CRD descriptions accordingly.
@Jakob3xD Jakob3xD force-pushed the fix/scheduler-cm-readiness-probe branch from 6c8a883 to 14a2364 Compare June 22, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kube-scheduler / kube-controller-manager readiness probe config is accepted but silently ignored

2 participants