fix(deployment): render readiness probe for scheduler and controller-manager#1193
Open
Jakob3xD wants to merge 3 commits into
Open
fix(deployment): render readiness probe for scheduler and controller-manager#1193Jakob3xD wants to merge 3 commits into
Jakob3xD wants to merge 3 commits into
Conversation
✅ Deploy Preview for kamaji-documentation ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
prometherion
requested changes
Jun 22, 2026
prometherion
left a comment
Member
There was a problem hiding this comment.
Please commit make apidoc changes since we need to reflect the API changes in the documentation too.
…ents The global probes.readiness now cascades to scheduler and controller-manager as well as kube-apiserver, matching liveness/startup. Update the godoc and regenerate the CRD descriptions accordingly.
6c8a883 to
14a2364
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Render an actual readiness probe for kube-scheduler and kube-controller-manager, so that
probes.scheduler.readiness,probes.controllerManager.readinessand the globalprobes.readinessstop being silently ignored for these two components.Fixes #1192.
Why
The CRD schema exposes a
readinessblock underapiServer,controllerManagerandscheduler(introduced in #1086), but the deployment builder only ever wired a readiness probe for kube-apiserver. For scheduler/CM the field was accepted by the API server and then had no effect — a silent no-op since it was introduced. This makes the documented configuration actually work.How
internal/builders/controlplane/deployment.go:probe == nilguard toapplyProbeOverrides(it now no-ops on a nil probe as well as a nil spec).build*functions onto them:defaultProbe(path, port)— the standard HTTPS probe shared by every component (only path/port differ; all use0/1/10/1/3).applyProbeSetOverrides(container, globalProbes, componentSet)— applies global overrides, then per-component overrides, to all three probe types.ReadinessProbeonGET /healthzover HTTPS (port10259/10257) — the same endpoint their existing liveness/startup probes already target.probes.readinessnow cascades to all three components (then per-component fields override), matching howliveness/startupalready cascade.The kube-apiserver path is a behaviour-preserving refactor:
defaultProbereproduces the previous apiserver probe literals byte-for-byte (/livez,/readyz,/livezonNetworkProfile.Port), guarded by a regression test.No CRD schema change — the
readinessfields were already exposed. The only API-package change is a godoc correction:ControlPlaneProbes.Readinesspreviously said "readiness probe of kube-apiserver", which is now inaccurate since the global field cascades to all three components; the comment and the regenerated CRDdescription(make manifests) are updated to match.Behavioural consequence (please review)
After this change the control-plane pod is not
Readyuntil kube-scheduler and kube-controller-manager/healthzalso return 200, in addition to the apiserver. Two notes:Servicedoes not setpublishNotReadyAddresses. So a sustained scheduler/CM/healthzfailure (≈FailureThreshold 3 × PeriodSeconds 10= 30s) will now remove an otherwise-healthy kube-apiserver from its Service endpoints. This is the intended readiness semantics, but it is a real availability-semantics change — flagging it so maintainers can decide whether anypublishNotReadyAddressesexpectation needs adjusting. (Leader election is unaffected: scheduler/CM/healthzreturns 200 on non-leaders, so standby replicas stay Ready.)initialDelaySeconds: 0on readiness cannot reintroduce the startup race from fix: kube-scheduler and kube-controller-manager CrashLoopBackoff on every TCP creation delays cluster provisioning #1178.Testing
go test ./internal/builders/controlplane/...— all specs pass, including new specs for scheduler/CM readiness rendering, the global-then-component override cascade, and a regression guard asserting the kube-apiserver probes are unchanged.go build ./...andgo vetare clean.