[proposal] Per-cluster etcd: retire the tenant etcd module and bind etcd to the Kubernetes app#25
[proposal] Per-cluster etcd: retire the tenant etcd module and bind etcd to the Kubernetes app#25myasnikovdaniil wants to merge 1 commit into
Conversation
Fold etcd into the kubernetes app so each tenant Kubernetes cluster provisions and owns its own etcd and Kamaji DataStore. End state is one etcd per one Kubernetes cluster: consumers can start a cluster with no admin pre-step, control-plane state is physically isolated per cluster, and no etcd runs unless a cluster needs it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Myasnikov Daniil <myasnikovdaniil2001@gmail.com>
📝 WalkthroughWalkthroughA new design proposal document is added at ChangesPer-cluster etcd design proposal
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a design proposal to retire the shared tenant etcd module and bind a dedicated etcd cluster directly to each Kubernetes app, improving self-service, performance isolation, and security. The review feedback highlights two important considerations: first, the risk of DNS label length violations (exceeding 63 characters) if the release name is too long, and second, a potential race condition during deletion where the etcd datastore might be reaped before the KamajiControlPlane finishes finalization.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| endpoints: | ||
| - {{ .Release.Name }}-etcd.{{ .Release.Namespace }}.svc:2379 | ||
| ``` |
There was a problem hiding this comment.
If .Release.Name is long, the service endpoint {{ .Release.Name }}-etcd might exceed the 63-character limit for Kubernetes Service names and DNS labels. To prevent deployment failures or DNS resolution issues, consider using a helper function (such as a truncated/hashed name helper) to format the etcd service name and endpoint, ensuring it strictly adheres to the 63-character limit.
|
|
||
| - **Two `Kubernetes` apps in one namespace** → each renders its own `<release>-etcd` and `<namespace>-<release>` DataStore; no name collision (the previous single shared `DataStore: <namespace>` could not represent two clusters distinctly). | ||
| - **DataStore name exceeds the cluster-scoped name length limit** → fall back to `<namespace>-<release>-<hash>`; the helper enforces the limit deterministically. | ||
| - **Cluster deletion** → the delete hook (`packages/apps/kubernetes/templates/delete.yaml`) must additionally delete the `EtcdCluster`, the cluster-scoped `DataStore`, and the per-cluster cert Secrets, *after* the existing `<release>-datastore-config` finalizer cleanup (delete.yaml already handles Kamaji's datastore-secret finalizer, issue #3062). Ordering: drain the control plane → strip the datastore-config finalizer → remove EtcdCluster + DataStore. |
There was a problem hiding this comment.
If the EtcdCluster or DataStore is deleted at the same time as the KamajiControlPlane (which is the default behavior when they are part of the same Helm release), a race condition can occur. If the etcd pods terminate before the KamajiControlPlane finishes its finalization, the Kamaji controller may fail to reconcile the control plane due to datastore connection timeouts, potentially blocking the deletion process or causing error storms in the operator logs. Consider implementing a mechanism (such as a finalizer on the DataStore or EtcdCluster that waits for the KamajiControlPlane to be gone, or explicit ordering in the deletion hook) to ensure the datastore is only reaped after the control plane has been completely deleted.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
design-proposals/etcd-per-cluster/README.md (2)
5-5: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueFuture date in a draft proposal.
The date
2026-06-30is at the end of the current month. If this is intentional as a target date, no action needed; if it's a placeholder, consider using the draft creation date or removing it until finalized.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@design-proposals/etcd-per-cluster/README.md` at line 5, The README draft currently uses a future placeholder date, so update the date field in the proposal header to a non-future value if this is meant to be a draft, or remove it until finalized. Locate the date entry in the proposal metadata section and replace the hardcoded `Date` value with the draft creation date or omit it entirely if the target date is not confirmed.
35-35: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider rephrasing "exactly" per style guidance.
Static analysis flagged this as overused. Alternatives: "precisely", "in effect", or restructure the sentence.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@design-proposals/etcd-per-cluster/README.md` at line 35, The sentence in the proposal body uses the overused word “exactly”; rephrase that line in the README text to avoid the flagged wording while keeping the same meaning. Update the prose near the per-cluster etcd description, using the existing surrounding sentence structure and terms like “per-cluster etcd” and “tenant-sprawl ceremony” to locate it, but replace “exactly” with a cleaner alternative such as “precisely” or “in effect.”Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@design-proposals/etcd-per-cluster/README.md`:
- Line 190: The Phase 4 removal instructions use the wrong key name for the
legacy etcd namespace propagation, which can mislead implementers. Update the
wording in the README section describing removal of the shared-etcd plumbing to
refer to the exact propagated Secret key and helper name, using _namespace.etcd
and cozy-lib.ns-etcd consistently with the existing references in the document.
- Line 173: Clarify the delete hook behavior in the cluster deletion section:
the current delete flow in the delete hook should only remove the per-cluster
EtcdCluster, DataStore, and cert Secrets for new per-cluster etcd clusters, and
must skip those resources for legacy shared-etcd clusters. Update the guidance
around packages/apps/kubernetes/templates/delete.yaml to explain how the hook
determines the mode (for example, by checking for the per-cluster DataStore or a
dedicated annotation) so the legacy shared EtcdCluster/DataStore owned by the
tenant module is not deleted.
---
Nitpick comments:
In `@design-proposals/etcd-per-cluster/README.md`:
- Line 5: The README draft currently uses a future placeholder date, so update
the date field in the proposal header to a non-future value if this is meant to
be a draft, or remove it until finalized. Locate the date entry in the proposal
metadata section and replace the hardcoded `Date` value with the draft creation
date or omit it entirely if the target date is not confirmed.
- Line 35: The sentence in the proposal body uses the overused word “exactly”;
rephrase that line in the README text to avoid the flagged wording while keeping
the same meaning. Update the prose near the per-cluster etcd description, using
the existing surrounding sentence structure and terms like “per-cluster etcd”
and “tenant-sprawl ceremony” to locate it, but replace “exactly” with a cleaner
alternative such as “precisely” or “in effect.”
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d31dff8a-6a76-4ef4-9d32-ca60354f0c77
📒 Files selected for processing (1)
design-proposals/etcd-per-cluster/README.md
|
|
||
| - **Two `Kubernetes` apps in one namespace** → each renders its own `<release>-etcd` and `<namespace>-<release>` DataStore; no name collision (the previous single shared `DataStore: <namespace>` could not represent two clusters distinctly). | ||
| - **DataStore name exceeds the cluster-scoped name length limit** → fall back to `<namespace>-<release>-<hash>`; the helper enforces the limit deterministically. | ||
| - **Cluster deletion** → the delete hook (`packages/apps/kubernetes/templates/delete.yaml`) must additionally delete the `EtcdCluster`, the cluster-scoped `DataStore`, and the per-cluster cert Secrets, *after* the existing `<release>-datastore-config` finalizer cleanup (delete.yaml already handles Kamaji's datastore-secret finalizer, issue #3062). Ordering: drain the control plane → strip the datastore-config finalizer → remove EtcdCluster + DataStore. |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Explicitly address delete hook behavior for legacy vs. new clusters.
The delete hook must distinguish between:
- Per-cluster etcd clusters: delete the
EtcdCluster,DataStore, and cert Secrets. - Legacy shared-etcd clusters: do not delete the shared
EtcdClusterorDataStore(owned by the tenant module/ancestor).
Without this distinction, deleting a legacy cluster could destroy the shared etcd and break all other clusters in the subtree. Add a sentence clarifying how the hook detects which mode applies (e.g., by the presence of the per-cluster DataStore or an annotation).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@design-proposals/etcd-per-cluster/README.md` at line 173, Clarify the delete
hook behavior in the cluster deletion section: the current delete flow in the
delete hook should only remove the per-cluster EtcdCluster, DataStore, and cert
Secrets for new per-cluster etcd clusters, and must skip those resources for
legacy shared-etcd clusters. Update the guidance around
packages/apps/kubernetes/templates/delete.yaml to explain how the hook
determines the mode (for example, by checking for the per-cluster DataStore or a
dedicated annotation) so the legacy shared EtcdCluster/DataStore owned by the
tenant module is not deleted.
| 1. **Phase 1 — self-provisioned etcd, opt-in.** Add the `etcd` values block and per-cluster etcd/DataStore rendering to the `kubernetes` app behind a default that self-provisions for new clusters while still honoring a legacy `_namespace.etcd` reference for existing ones. Extend `delete.yaml` to reap the per-cluster etcd/DataStore. Ship docs for the new sizing knobs. | ||
| 2. **Phase 2 — migration path.** Land and prove the shared→dedicated data migration (Kamaji datastore migration or snapshot/restore), as a numbered-migration hook where idempotent or a gated runbook otherwise. Provide a dashboard/CLI signal of which clusters are still on the shared etcd. | ||
| 3. **Phase 3 — deprecate the tenant module.** Mark `Tenant.spec.etcd` and the etcd tenant-module catalog entry deprecated; new tenants no longer offer it. Stop hardcoding `etcd: tenant-root`. | ||
| 4. **Phase 4 — remove plumbing.** Once telemetry shows no cluster on the shared etcd, delete `apps/tenant/templates/etcd.yaml`, the `etcd` tenant value, the `_namespace.etcd` propagation and `cozy-lib.ns-etcd` helper, and the `namespace.cozystack.io/etcd` label. Keep the etcd-operator and the etcd chart. |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Clarify the legacy reference key name.
Line 190 says "Stop hardcoding etcd: tenant-root" but the actual value propagated is _namespace.etcd: tenant-root (line 26). Dropping the _namespace. prefix could confuse implementers about which Secret key and helper to remove. Align with the exact key name used elsewhere.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@design-proposals/etcd-per-cluster/README.md` at line 190, The Phase 4 removal
instructions use the wrong key name for the legacy etcd namespace propagation,
which can mislead implementers. Update the wording in the README section
describing removal of the shared-etcd plumbing to refer to the exact propagated
Secret key and helper name, using _namespace.etcd and cozy-lib.ns-etcd
consistently with the existing references in the document.
|
+1 on the core direction — folding etcd into the Writing as an operator of a CozyStack-based managed platform, where we run per-tenant Kubernetes clusters at scale. On the "per-cluster etcd multiplies overhead" concernIn practice, per-cluster etcd is a modest addition on top of what a
Per-cluster etcd adds ~18% to requests CPU and ~28% to CPU/memory limits on Small aside: On the proposed
|
Summary
Design proposal to retire etcd as a tenant module and fold it into the
kubernetesapp, so each tenant Kubernetes cluster provisions and owns its own etcd and KamajiDataStore.End state: one etcd per one Kubernetes cluster.
Why
Kubernetesapp cannot start a cluster until an admin setsspec.etcd: trueon an ancestorTenant— a parameter consumers have no RBAC to change. The cluster sits on theawaiting-etcdbeacon. After this change, creating theKubernetesapp yields a working cluster with no admin pre-step.Kubernetescluster in a tenant subtree shares one cluster-scoped KamajiDataStorebacked by a single etcd, isolated only by per-control-plane credentials/prefix — shared Raft group, disk,quota-backend-bytes, and CA. This is not production-compatible (performance contention, single blast radius). Per-cluster etcd physically isolates each cluster.The overhead (a per-cluster etcd) is accepted and made tunable via a new
etcdvalues block; isolation-conscious users already pay it today by carving a child tenant per cluster.Contents
design-proposals/etcd-per-cluster/README.md— fills the proposal template: Context (exact code wiring), Problem, Goals/Non-goals, Design (with before/after Mermaid diagram), User-facing changes, Upgrade & rollback (opt-in compat window + per-cluster migration), Security, Failure/edge cases, Testing, 4-phase Rollout, Open questions, Alternatives.Status: Draft — requesting feedback.
Summary by CodeRabbit