fix: add kubeadmRbac to soot triggers slice#1169
Conversation
PhaseClusterAdminRBAC was the only soot controller whose TriggerChannel was not included in the sootItem triggers slice. Every other controller (writePermissions, migrate, konnectivityAgent, kubeProxy, coreDNS, uploadKubeadmConfig, uploadKubeletConfig, bootstrapToken) was present. The consequence: when the TCP reconciler notifies soot controllers of a TCP change, PhaseClusterAdminRBAC is never triggered via the channel. It can only reconcile when a ClusterRoleBinding with a "kubeadm:" prefix already exists in the tenant cluster (its For() watch predicate), which means on a freshly created cluster — where no such CRB exists yet — the phase is never triggered and the kubeadm:cluster-admins ClusterRoleBinding is never created. Since kubeadm v1.29 (KEP-2305), admin.conf carries O=kubeadm:cluster-admins instead of O=system:masters. Any operator using the admin-kubeconfig secret in the window before PhaseClusterAdminRBAC has created the CRB gets HTTP 403, because the credential authenticates but lacks a binding. Adding kubeadmRbac.TriggerChannel to the triggers slice ensures PhaseClusterAdminRBAC is triggered on every TCP reconcile, closing the window for new clusters. Fixes: clastix#1167 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for kamaji-documentation canceled.
|
The previous fix added kubeadmRbac to the triggers slice but that path only fires when the soot manager is already in sootMap. On initial cluster creation the first reconcile goes through the startup path and returns RequeueAfter:time.Second — meaning the CRB is not created until the next TCP reconcile, a 3-10 second window that exposes admin.conf callers to 403. Two changes close the initial-creation window: 1. Make TriggerChannel buffered (cap 1) so a pre-startup send succeeds immediately without blocking on a live consumer. 2. Pre-seed TriggerChannel with one event immediately after starting the manager goroutine. When the soot manager cache warms up, source.Channel drains the buffer and enqueues a reconcile for kubeadmRbac, which calls EnsureAdminClusterRoleBinding and creates the CRB without waiting for an external TCP object event. The trigger in the sootItem triggers slice is retained for drift correction. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a tenant-cluster bootstrap race where PhaseClusterAdminRBAC (which creates the kubeadm:cluster-admins -> cluster-admin ClusterRoleBinding) wasn’t being triggered early enough, briefly leaving admin.conf callers without sufficient RBAC on newly created clusters.
Changes:
- Buffer
kubeadmRbac.TriggerChannelto allow an initial trigger to be sent before the controller’s channel source is actively consuming. - Pre-seed the
kubeadmRbactrigger immediately after starting the soot manager goroutine so the RBAC phase reconciles on startup. - Add
kubeadmRbac.TriggerChannelto the soot manager’s trigger fan-out slice to support drift correction via subsequent TCP reconciles.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@prometherion can you take a look? Would appreciate. Thanks! |
| Phase: resources.PhaseClusterAdminRBAC, | ||
| }, | ||
| TriggerChannel: make(chan event.GenericEvent), | ||
| TriggerChannel: make(chan event.GenericEvent, 1), // buffered: pre-seeded below to run on startup |
There was a problem hiding this comment.
We don't need to change the channel's buffering: what's the rationale?
I get the point of the initial seeding, but I think this is not required.
| uploadKubeadmConfig.TriggerChannel, | ||
| uploadKubeletConfig.TriggerChannel, | ||
| bootstrapToken.TriggerChannel, | ||
| kubeadmRbac.TriggerChannel, |
There was a problem hiding this comment.
This is absolutely a bug, and this is required.
| // Pre-seed so PhaseClusterAdminRBAC runs on startup rather than waiting for a kubeadm: CRB watch event. | ||
| var initTCP kamajiv1alpha1.TenantControlPlane | ||
| initTCP.Name = tcp.Name | ||
| initTCP.Namespace = tcp.Namespace | ||
| kubeadmRbac.TriggerChannel <- event.GenericEvent{Object: &initTCP} |
There was a problem hiding this comment.
I don't get the point of running this by triggering it, except to add an exception considering the eventually consistent nature of Kubernetes.
When the Manager is configured, it will start the Watch against the kubeadm:* Cluster Role, which is created by the UploadConfiguration's Kubeadm function that will be triggered immediately upon soot manager setup.
Fixes #1167.
Root cause. Since kubeadm v1.29 (KEP-2305),
admin.confcarriesO=kubeadm:cluster-adminsinstead ofO=system:masters. The matchingkubeadm:cluster-admins → cluster-adminClusterRoleBinding is created byPhaseClusterAdminRBACin the soot manager. Two gaps prevent the CRB from being created quickly enough:kubeadmRbac.TriggerChannelwas absent from thesootItem.triggersslice — so when the TCP reconciler notifies soot controllers,PhaseClusterAdminRBACwas never poked. It could only reconcile when akubeadm:*ClusterRoleBinding watch event fired, which never happens on a brand-new cluster.Even after adding the trigger, it only fires on subsequent TCP reconciles (when the manager is already in
sootMap). On initial startup the reconciler returnsRequeueAfter:time.Secondand exits — the CRB isn't created until the next TCP reconcile, leaving a 3–10 second window whereadmin.confcallers receive 403.Fix. Two changes, together closing the initial-creation window:
kubeadmRbac.TriggerChannelbuffered (cap 1). An unbuffered channel requires a live consumer; a pre-startup send would block until the manager goroutine starts consuming. Buffered lets the send succeed immediately.TriggerChannelwith one event immediately after launching the manager goroutine. When the soot manager's cache warms up,source.Channeldrains the buffer and enqueues a reconcile forkubeadmRbac, which callsEnsureAdminClusterRoleBindingand mints the CRB without waiting for an external event.The trigger in
sootItem.triggersis retained for drift correction: if the CRB is deleted on a running cluster, the next TCP reconcile re-triggersPhaseClusterAdminRBACand restores it.