Feat/watch namespaces#1146
Conversation
testcontainers-go v0.42 changed Container.Exec to return a nil stdout reader when the exec call fails before producing any output (v0.41 always returned an empty, non-nil reader). Calling io.ReadAll on the nil reader panics with a nil-pointer dereference, which was breaking worker_kubeadm_join_test.go since the testcontainers bump. Make the three Exec sites return early when err != nil — that is the case in which stdout was nil and the panic occurred, and on a non-nil err the exitCode and stdout values are not meaningful anyway. In the two best-effort blocks (modprobe br_netfilter, swapoff -a) the returned error is logged and the spec moves on; in the join command block the error fails the It via Expect. Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com>
Introduce an opt-in --watch-namespaces flag on both the controller and the kubeconfig-generator. When set, the manager's cache only watches namespaced informers in the listed namespaces via controller-runtime's cache.Options.DefaultNamespaces. Cluster-scoped resources (DataStore, ClusterRole, ...) are unaffected and stay cluster-wide. When the flag is empty (default), Kamaji keeps its current cluster-wide behaviour. The new internal/manager package hosts three small helpers: * NewCacheFunc returns a cache.NewCacheFunc with the resync period and the optional namespace scope applied. * MergeWatchedNamespaces canonicalises the user-supplied list (trims, drops blanks, deduplicates while preserving first-seen order) and appends caller-supplied extras. Returns nil when the user list is empty so the cache stays cluster-wide. * ValidateNamespaces fails fast on invalid identifiers (RFC 1123 label) so misconfiguration surfaces at startup rather than as opaque watch errors. Both binaries pass their install namespace through MergeWatchedNamespaces extras so it is always part of the watch set: the controller needs it for the migration Job watch (predicated on KamajiNamespace), and the kubeconfig-generator includes it for symmetry to keep leader-election working defensively. Unit tests cover all three helpers plus the applyOptions transform end to end, including the canonicalisation path through applyOptions. Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com>
Thread the new --watch-namespaces flag through the Helm chart on both the controller and the kubeconfig-generator deployment templates, with a single values.yaml entry (watchNamespaces, defaulting to []) feeding both. When the list is empty the flag is omitted so existing installs keep their cluster-wide behaviour. The kubeconfig-generator template also gains a POD_NAMESPACE env var projected from metadata.namespace, mirroring the controller. The operator code path that auto-includes the install namespace in the watch set reads its own pod namespace from this env var, so without the projection --watch-namespaces would silently drop the install namespace for the kubeconfig-generator. charts/kamaji/README.md regenerated via helm-docs. Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com>
Add an Ordered Ginkgo describe that mutates the running kamaji controller-manager Deployment to validate cache scoping end-to-end: * the first It creates a TenantControlPlane in a watched namespace and asserts the operator reconciles it; * the second It creates one in an unwatched namespace and asserts the operator never touches it (Consistently); * an originalArgs snapshot + DeferCleanup restores the Deployment args even when an assertion aborts the run. The describe is named with a zz_ prefix so Ginkgo executes it after every other spec in the suite, minimising the blast radius. Several test helpers were tuned for stability: * findManagerDeployment further filters by the manager container's args[0] entrypoint, since the kubeconfig-generator deployment shares the same selector labels (both use kamaji.selectorLabels). * waitForRollout strictly asserts len(pods.Items) == replicas with no DeletionTimestamp, so terminating pods from the previous generation cannot leak past the rollout (this also fixes a latent HaveLen(1) flake in PrintKamajiLogs in utils_test.go). * waitForWebhookReady polls the kamaji-webhook-service EndpointSlices for at least one Ready endpoint. * createTCPWithRetry retries the Create on transient webhook-unreachable errors. The container readiness probe targets healthz (8081), not the webhook port (9443), so EndpointSlice readiness alone does not guarantee the webhook server has bound yet, and kube-proxy may also lag by one informer tick. Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com>
Document the new --watch-namespaces flag: what it scopes, what remains cluster-wide, the install-namespace auto-include, and a short Helm install example. Linked from the guides nav so it is discoverable from the rendered site. Signed-off-by: Wilfried Roset <wilfriedroset@users.noreply.github.com>
✅ Deploy Preview for kamaji-documentation ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Thanks for being transparent. Although we appreciate contributions, I have some remarks when merging external contributors code, mostly because it creates direct implications in maintaining such a feature. I'm honestly neither convinced by the feature request since I see a potential security threat. |
I would happily rework the PR to meet your standard and liking. would you be able to elaborate regarding your concerns?
My usecase is that I'm one of the tenant of a shared k8s cluster. My platform team are focused on powering the platform itself. Deploying and maintaining kamaji for a single tenant is out of their scope. I want to deploy and maintain my instance of kamaji in my own namespace, hence scoping the namespace watched by the operator becomes handy.
I was under the impression that this opt-in feature was adopted by several operators. |
|
My concerns are regarding RBAC: although the changes introduce the ability to watch only for a specific subset of Namespaces, the resulting Kamaji instance still relies on a There are no traces of this support in the Helm templates: we could just ignore this and leverage only the ClusterRole, but that would mean an attacker could easily change the arguments of the Kamaji namespace-scoped instance and start disrupting others' Namespace installations, as well as exfiltrating Secrets, ConfigMaps, etc. This is just the thread modelling for the Tenant Control Plane: there's also the Datastore API, which contains sensitive data and relies on the Cluster-scoped level. In a multi-tenant environment, you would need to grant access to the Datastore of that tenant; otherwise, data could be potentially leaked with a sophisticated attack, hard to reproduce, but still possible. For such features, as maintainers, we prefer to have a proper discussion and a clear picture: as you can see, it's not just a matter of a simple feature, but it would require an important refactoring of the code-base and the installation of Kamaji itself, which would then be a burden on CLASTIX, which is maintaining the project and honours customers with (where possible) bug-free stable releases. Last but not least, Issues are used to track bugs: features or enhancement proposals are preferred to be discussed in the Discussion section, or privately in case of sensitive use-cases. |
Noted, and sorry for the wrong door! I had followed https://kamaji.clastix.io/guides/contribute/ and landed on issue #1145, but I see now that a Discussion would have been the better starting point. The PR was really meant as a "here is what it could look like" sketch rather than a request to merge as-is — happy to move the conversation wherever suits you best.
What about letting users bring their own RBAC via a chart value like
I want to make sure I'm reading the threat model the same way you are: if an attacker can already mutate the operator's args, it seems they could also swap the image, add a malicious init container that inherits the pod's RBAC, mount extra volumes, etc.
I should have called this out upfront: I deliberately stayed clear of the cluster-scoped vs namespace-scoped DataStore question in this PR. A TCP in namespace A still references a cluster-scoped DataStore, so cross-tenant leakage stays structurally possible even with a perfectly scoped cache and RBAC. For what it's worth, this is a commonly used pattern in the operator ecosystem: many projects ship a namespace-scoped CR alongside a cluster-scoped variant: cert-manager's Issuer / ClusterIssuer being an example. We can also see this pattern in a more recent operator with cnpg's ImangeCatalog / ClusterImageCatalog. Long term, what we'd love to be able to do is define our own DataStore resources directly inside our tenant namespaces, each binding to a Secret (which in our setup syncs PG credentials from our internal secret store). That would close the loop on multi-tenancy without requiring cluster-admin involvement for every new tenant. This is a topic for a separate discussion. The PR description was probably overselling things. "narrow the watch cache" is a more honest framing than "namespace-scoped Kamaji", since the latter hints at guarantees this change does not actually provide. Would you be open to:
|
We would end up maintaining two different deployment strategies: a cluster-scoped one (the default), and one focused on single/multiple Namespaces. The latter, although optional, could raise issues with the security assessment we go through, and it's not acceptable to us. I'm not saying it's not possible, it would require time and efforts we're currently allocating to other features prioritized.
That's why Kamaji is installed in its own Namespace: Tenants restricted to other Namespaces can't change it.
I was referring to the privilege escalation mechanism in Kubernetes: you can't grant RBAC permissions that the user doesn't hold. To install Kamaji at the Tenant level, the Operator must have cluster-scoped permissions, which doesn't fit the multi-tenancy design in Kubernetes.
That's an enhancement proposal; it can be put in our backlog. Kamaji has its own feature development map driven by customers: that's how we keep the project alive, getting financed directly from those benefiting from it.
Even addressing the changes, the better approach would be to have a local DataStore reference, but that would be highly impactful and would require a considerable amount of work. We're happy for contributions, but such a scope would be very massive and would require a very detailed review process. We would prefer addressing this on our own, being the ones maintaining the project for everybody, especially for our customers, who aren't looking for such a feature.
This is a no-go for the security implications mentioned before. |
|
I understand all your points. Is there some options I did not identify/discuss here to continue the work on this feature/PR? |
|
Such impactful changes are problematic since they burden the development activities, as happened with the NATS support, Gateway API, or other contributions that caused bug reports to our customers. The development of these features requires financial support, either by contributing a dedicated maintainer to the project on behalf of the requester or through a formal feature request, development, and prioritisation process designed to ensure project maintainability and alignment with the expectations of commercial adopters. |
This PR addresses #1145.
Full disclosure: I've used claude with opus 4.7 extensively to land the feature as I'm no golang expert. I've fully tested manually on a production cluster and self-review the changes