Skip to content

fix: add kubeadmRbac to soot triggers slice#1169

Open
HarshavardhanK wants to merge 3 commits into
clastix:masterfrom
HarshavardhanK:fix/phase-cluster-admin-rbac-trigger
Open

fix: add kubeadmRbac to soot triggers slice#1169
HarshavardhanK wants to merge 3 commits into
clastix:masterfrom
HarshavardhanK:fix/phase-cluster-admin-rbac-trigger

Conversation

@HarshavardhanK

@HarshavardhanK HarshavardhanK commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Fixes #1167.

Root cause. Since kubeadm v1.29 (KEP-2305), admin.conf carries O=kubeadm:cluster-admins instead of O=system:masters. The matching kubeadm:cluster-admins → cluster-admin ClusterRoleBinding is created by PhaseClusterAdminRBAC in the soot manager. Two gaps prevent the CRB from being created quickly enough:

  1. kubeadmRbac.TriggerChannel was absent from the sootItem.triggers slice — so when the TCP reconciler notifies soot controllers, PhaseClusterAdminRBAC was never poked. It could only reconcile when a kubeadm:* ClusterRoleBinding watch event fired, which never happens on a brand-new cluster.

  2. Even after adding the trigger, it only fires on subsequent TCP reconciles (when the manager is already in sootMap). On initial startup the reconciler returns RequeueAfter:time.Second and exits — the CRB isn't created until the next TCP reconcile, leaving a 3–10 second window where admin.conf callers receive 403.

Fix. Two changes, together closing the initial-creation window:

  • Make kubeadmRbac.TriggerChannel buffered (cap 1). An unbuffered channel requires a live consumer; a pre-startup send would block until the manager goroutine starts consuming. Buffered lets the send succeed immediately.
  • Pre-seed TriggerChannel with one event immediately after launching the manager goroutine. When the soot manager's cache warms up, source.Channel drains the buffer and enqueues a reconcile for kubeadmRbac, which calls EnsureAdminClusterRoleBinding and mints the CRB without waiting for an external event.

The trigger in sootItem.triggers is retained for drift correction: if the CRB is deleted on a running cluster, the next TCP reconcile re-triggers PhaseClusterAdminRBAC and restores it.

PhaseClusterAdminRBAC was the only soot controller whose TriggerChannel
was not included in the sootItem triggers slice. Every other controller
(writePermissions, migrate, konnectivityAgent, kubeProxy, coreDNS,
uploadKubeadmConfig, uploadKubeletConfig, bootstrapToken) was present.

The consequence: when the TCP reconciler notifies soot controllers of a
TCP change, PhaseClusterAdminRBAC is never triggered via the channel.
It can only reconcile when a ClusterRoleBinding with a "kubeadm:" prefix
already exists in the tenant cluster (its For() watch predicate), which
means on a freshly created cluster — where no such CRB exists yet — the
phase is never triggered and the kubeadm:cluster-admins ClusterRoleBinding
is never created.

Since kubeadm v1.29 (KEP-2305), admin.conf carries O=kubeadm:cluster-admins
instead of O=system:masters. Any operator using the admin-kubeconfig secret
in the window before PhaseClusterAdminRBAC has created the CRB gets HTTP
403, because the credential authenticates but lacks a binding.

Adding kubeadmRbac.TriggerChannel to the triggers slice ensures
PhaseClusterAdminRBAC is triggered on every TCP reconcile, closing the
window for new clusters.

Fixes: clastix#1167

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@netlify

netlify Bot commented Jun 4, 2026

Copy link
Copy Markdown

Deploy Preview for kamaji-documentation canceled.

Name Link
🔨 Latest commit 86872f1
🔍 Latest deploy log https://app.netlify.com/projects/kamaji-documentation/deploys/6a22124ba7b2bc0008878b73

@HarshavardhanK HarshavardhanK changed the title fix(soot): add kubeadmRbac to soot triggers slice Soot: add kubeadmRbac to soot triggers slice Jun 4, 2026
HarshavardhanK and others added 2 commits June 4, 2026 16:50
The previous fix added kubeadmRbac to the triggers slice but that path
only fires when the soot manager is already in sootMap. On initial cluster
creation the first reconcile goes through the startup path and returns
RequeueAfter:time.Second — meaning the CRB is not created until the next
TCP reconcile, a 3-10 second window that exposes admin.conf callers to 403.

Two changes close the initial-creation window:

1. Make TriggerChannel buffered (cap 1) so a pre-startup send succeeds
   immediately without blocking on a live consumer.

2. Pre-seed TriggerChannel with one event immediately after starting the
   manager goroutine. When the soot manager cache warms up, source.Channel
   drains the buffer and enqueues a reconcile for kubeadmRbac, which calls
   EnsureAdminClusterRoleBinding and creates the CRB without waiting for an
   external TCP object event.

The trigger in the sootItem triggers slice is retained for drift correction.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@HarshavardhanK HarshavardhanK changed the title Soot: add kubeadmRbac to soot triggers slice fix: add kubeadmRbac to soot triggers slice Jun 5, 2026
@prometherion prometherion requested a review from Copilot June 5, 2026 06:35

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a tenant-cluster bootstrap race where PhaseClusterAdminRBAC (which creates the kubeadm:cluster-admins -> cluster-admin ClusterRoleBinding) wasn’t being triggered early enough, briefly leaving admin.conf callers without sufficient RBAC on newly created clusters.

Changes:

  • Buffer kubeadmRbac.TriggerChannel to allow an initial trigger to be sent before the controller’s channel source is actively consuming.
  • Pre-seed the kubeadmRbac trigger immediately after starting the soot manager goroutine so the RBAC phase reconciles on startup.
  • Add kubeadmRbac.TriggerChannel to the soot manager’s trigger fan-out slice to support drift correction via subsequent TCP reconciles.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@HarshavardhanK

Copy link
Copy Markdown
Contributor Author

@prometherion can you take a look? Would appreciate. Thanks!

Phase: resources.PhaseClusterAdminRBAC,
},
TriggerChannel: make(chan event.GenericEvent),
TriggerChannel: make(chan event.GenericEvent, 1), // buffered: pre-seeded below to run on startup

@prometherion prometherion Jun 11, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to change the channel's buffering: what's the rationale?

I get the point of the initial seeding, but I think this is not required.

uploadKubeadmConfig.TriggerChannel,
uploadKubeletConfig.TriggerChannel,
bootstrapToken.TriggerChannel,
kubeadmRbac.TriggerChannel,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely a bug, and this is required.

Comment on lines +396 to +400
// Pre-seed so PhaseClusterAdminRBAC runs on startup rather than waiting for a kubeadm: CRB watch event.
var initTCP kamajiv1alpha1.TenantControlPlane
initTCP.Name = tcp.Name
initTCP.Namespace = tcp.Namespace
kubeadmRbac.TriggerChannel <- event.GenericEvent{Object: &initTCP}

@prometherion prometherion Jun 11, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the point of running this by triggering it, except to add an exception considering the eventually consistent nature of Kubernetes.

When the Manager is configured, it will start the Watch against the kubeadm:* Cluster Role, which is created by the UploadConfiguration's Kubeadm function that will be triggered immediately upon soot manager setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

admin.conf unusable during cluster creation due to race between TCP reconciler and PhaseClusterAdminRBAC

3 participants