feat: full stale-Peer cleanup + fold anchor CIDRs into node Peer#11
Conversation
Walk every cluster in the merged Registry (built from all ClusterMesh specs), not just this mesh's spec.Clusters. This closes the second half of the stale-peer problem: peers a mesh pushed into a cluster that has since been removed from its spec stay reachable via any sibling ClusterMesh that still references the cluster, so the sweep can reach them. Adds a cross-CR integration case to test/integration/cleanup_test.go that pins this contract: a peer planted in 'remote' for a mesh whose spec contains only 'local' is removed after reconcile. Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Build returned an error from the first cluster entry whose kubeconfig
Secret could not be read. During tenant teardown this crashed the
operator at startup: the upstream KubernetesSwitchcloud Helm release
deletes the tenant's admin-kubeconfig Secret before the ClusterMesh CR
that references it is finalized, leaving a stale secretRef that
buildInitialRegistry then dereferences. The operator crashlooped, the
finalizer never ran, the ClusterMesh CR stuck with finalizers forever,
and downstream teardown stalled.
Build now logs a warning per problematic entry and skips it, returning
the rest of the registry as usable. cluster.Cluster construction
failures are treated the same way. The reconciler already does
best-effort sweeps via r.Registry.Client(name) with 'if !ok { continue }',
so a partial registry preserves forward progress: finalizers clean
peers in the clusters they can still reach, the ClusterMesh CR
releases, and the operator can come back up clean on the next reconcile.
A nil logger is accepted to keep the signature ergonomic; it routes to
slog.DiscardHandler so test callers and other call sites do not need
to thread a logger through to use Build.
cmd/main.go threads the existing slogger through buildInitialRegistry
so missing-secret warnings show up in the operator log at WARN level.
Three unit tests cover the new behaviour:
- TestBuild_SkipsEntryWithMissingSecret: reproduces the crashloop bug
with a mesh+ceph pair where only ceph's kubeconfig Secret exists.
- TestBuild_LocalEntryDoesNotNeedSecret: guards the Local: true
fast-path, which must keep succeeding regardless of Secret state.
- TestBuild_NilLoggerIsAccepted: protects against nil-deref for
call sites that have not yet wired a logger.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
handleDeletion only runs when the finalizer reconcile gets to fire. In
production we have already seen the finalizer skipped:
- the operator crashlooped (missing kubeconfig Secret blocked
buildInitialRegistry), so the finalizer reconcile never ran, and
the user manually removed the finalizer to unblock teardown.
- force-delete via the API or controller-runtime client.
In all three cases the cleanup half of teardown is bypassed, leaving
Peer objects labeled with a mesh name whose ClusterMesh CR no longer
exists. The per-CR cleanupStaleSourceClusters pass added in the
previous commits cannot touch those: it only looks at peers labeled
with its OWN mesh name. The ghosts persist until a human notices.
Add cleanupOrphanMeshPeers: every reconcile of any ClusterMesh now
lists every other ClusterMesh in the namespace, collects the set of
living mesh names, walks every cluster in the registry, and deletes
any Peer whose kilo-clustermesh.io/mesh label points at a name not in
that set. Failures (list, delete) are logged per-peer and the pass
continues. The sweep is namespace-scoped: it only considers ClusterMesh
objects in the reconciling CR's namespace, so it cannot accidentally
clobber peers managed from another namespace by another operator.
Wired into Reconcile() after the per-CR cleanupStaleSourceClusters
pass so that every live reconcile contributes to the global sweep —
the cluster reconverges as soon as any mesh's reconcile fires.
Two integration cases in test/integration/cleanup_test.go:
- DeletesGhostsAfterCRGone: plants a peer labeled with a ghost-mesh
name; live mesh reconcile removes it.
- LeavesPeersOfLivingMeshAlone: regression guard ensuring the sweep
keeps peers whose mesh label still corresponds to an existing CR.
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
…Peer Kilo (and WireGuard generally) identifies peers exclusively by their public key. The receiving cluster's kilo daemon issues `wg setconf` to load Peer CRs into kernel state, and a second peer entry with the same pubkey REPLACES the first — AllowedIPs are not merged. The operator had been emitting two Peer objects per source-cluster node anchor: one "node" Peer with AllowedIPs=[podCIDR, wgIP/32] and one "anchor" Peer with AllowedIPs=[serviceCIDR, additionalCIDRs], both reusing the anchor node's pubkey. On the receiving cluster (e.g. nuvolos-ceph in our live test), kilo applied whichever Peer it processed last; the other entry's AllowedIPs were silently dropped from the WG kernel state. The result was a racy half-outage: sometimes pod-CIDR traffic worked but the serviceCIDR was unrouted; sometimes the inverse. The mesh1 path happened to land in the favourable order; mesh2 didn't, and its handshake completed (we observed it via tcpdump) yet inner packets were dropped on ingress because their source IP was outside the WG peer's AllowedIPs. Stop emitting the separate anchor Peer. Fold the cluster-wide CIDRs into the first valid node's Peer.AllowedIPs via a new extraAllowedIPs parameter on BuildPeer, with CollectAnchorCIDRs exposing the shape that was previously private. The buildDesiredPeers loop picks the first node as the anchor carrier; other nodes get nil extras. Old anchor Peer objects on remote clusters disappear naturally: their NAME (<mesh>--<cluster>--anchor) is no longer in the desired set, so ReconcilePeers' per-pair orphan sweep deletes them on the next reconcile. Knock-on changes: - builder_test.go updates every BuildPeer call site for the new 4-argument signature (existing tests pass nil extras). - TestBuildAnchorPeer_* renamed to TestBuildPeer_AnchorExtras_* and rewritten to exercise the new fold-in semantics. The NoEndpointSource case is now a hard error from BuildPeer instead of a nil from BuildAnchorPeer, which is strictly stricter: misconfiguration surfaces at reconcile time rather than silently losing cluster-wide CIDR routing. Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request refactors peer generation by folding cluster-wide CIDRs directly into the first valid node's Peer instead of emitting a separate anchor Peer, which prevents WireGuard public key deduplication from clobbering AllowedIPs. It also enhances the operator's resilience by skipping failed cluster entries during registry builds rather than aborting, and introduces self-healing passes to clean up stale and orphan peers. The review feedback highlights critical improvements, including adding a defensive check for empty node.Spec.PodCIDRs to prevent runtime panics, and wrapping remote cluster operations with timeout contexts to avoid blocking the reconciliation loop.
| allowedIPs := make([]string, 0, 2+len(extraAllowedIPs)) | ||
| allowedIPs = append(allowedIPs, node.Spec.PodCIDRs[0], netutil.HostRoute(hostIP)) | ||
| allowedIPs = append(allowedIPs, extraAllowedIPs...) |
There was a problem hiding this comment.
Accessing node.Spec.PodCIDRs[0] directly without checking if node.Spec.PodCIDRs is empty can cause a runtime panic if a node is registered but does not have its PodCIDRs allocated yet. We should add a defensive check to ensure node.Spec.PodCIDRs has at least one element.
if len(node.Spec.PodCIDRs) == 0 {
return nil, errors.Newf("node %q has no PodCIDRs allocated", node.Name)
}
allowedIPs := make([]string, 0, 2+len(extraAllowedIPs))
allowedIPs = append(allowedIPs, node.Spec.PodCIDRs[0], netutil.HostRoute(hostIP))
allowedIPs = append(allowedIPs, extraAllowedIPs...)| err := peer.DeleteStaleSourceClusters(ctx, tgtClient, mesh.Name, sources) | ||
| if err != nil { | ||
| log.Warn("cleaning stale source-cluster peers", | ||
| slog.String("target", tgtEntry.Name), | ||
| slog.String("target", name), | ||
| slog.String("error", err.Error()), | ||
| ) | ||
| } |
There was a problem hiding this comment.
Reconciling stale source-cluster peers synchronously across all clusters in the registry can block the entire reconciliation loop if any remote cluster is offline or slow. To prevent a single unhealthy cluster from degrading the operator, we should use a context with a short timeout for the cleanup call.
cleanupCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
err := peer.DeleteStaleSourceClusters(cleanupCtx, tgtClient, mesh.Name, sources)
cancel()
if err != nil {
log.Warn("cleaning stale source-cluster peers",
slog.String("target", name),
slog.String("error", err.Error()),
)
}| for _, clusterName := range r.Registry.Clusters() { | ||
| tgtClient, ok := r.Registry.Client(clusterName) | ||
| if !ok { | ||
| continue | ||
| } | ||
|
|
||
| var peers kilov1alpha1.PeerList | ||
|
|
||
| err := tgtClient.List(ctx, &peers) | ||
| if err != nil { | ||
| log.Warn("listing peers for orphan sweep", | ||
| slog.String("target", clusterName), | ||
| slog.String("error", err.Error()), | ||
| ) | ||
|
|
||
| continue | ||
| } | ||
|
|
||
| for i := range peers.Items { | ||
| peerObj := &peers.Items[i] | ||
|
|
||
| meshLabel := peerObj.Labels[peer.LabelMesh] | ||
| if meshLabel == "" { | ||
| continue | ||
| } | ||
|
|
||
| if _, alive := living[meshLabel]; alive { | ||
| continue | ||
| } | ||
|
|
||
| deleteErr := tgtClient.Delete(ctx, peerObj) | ||
| if deleteErr != nil && !apierrors.IsNotFound(deleteErr) { | ||
| log.Warn("deleting orphan peer", | ||
| slog.String("target", clusterName), | ||
| slog.String("peer", peerObj.Name), | ||
| slog.String("error", deleteErr.Error()), | ||
| ) | ||
|
|
||
| continue | ||
| } | ||
|
|
||
| log.Info("deleted orphan peer whose ClusterMesh CR no longer exists", | ||
| slog.String("target", clusterName), | ||
| slog.String("peer", peerObj.Name), | ||
| slog.String("orphan-mesh", meshLabel), | ||
| ) | ||
| } | ||
| } |
There was a problem hiding this comment.
Listing and deleting orphan peers synchronously across all clusters in the registry can block the reconciliation loop if any remote cluster is offline or slow. We should wrap the operations for each cluster with a short timeout context. Make sure to call cancel() at the end of each iteration to avoid context leaks.
for _, clusterName := range r.Registry.Clusters() {
tgtClient, ok := r.Registry.Client(clusterName)
if !ok {
continue
}
clusterCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
var peers kilov1alpha1.PeerList
err := tgtClient.List(clusterCtx, &peers)
if err != nil {
log.Warn("listing peers for orphan sweep",
slog.String("target", clusterName),
slog.String("error", err.Error()),
)
cancel()
continue
}
for i := range peers.Items {
peerObj := &peers.Items[i]
meshLabel := peerObj.Labels[peer.LabelMesh]
if meshLabel == "" {
continue
}
if _, alive := living[meshLabel]; alive {
continue
}
deleteErr := tgtClient.Delete(clusterCtx, peerObj)
if deleteErr != nil && !apierrors.IsNotFound(deleteErr) {
log.Warn("deleting orphan peer",
slog.String("target", clusterName),
slog.String("peer", peerObj.Name),
slog.String("error", deleteErr.Error()),
)
continue
}
log.Info("deleted orphan peer whose ClusterMesh CR no longer exists",
slog.String("target", clusterName),
slog.String("peer", peerObj.Name),
slog.String("orphan-mesh", meshLabel),
)
}
cancel()
}|
Warning Review limit reached
More reviews will be available in 13 minutes and 50 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR threads a slog.Logger into multicluster.Build, makes Build skip and warn on per-entry failures, changes peer construction to fold cluster-wide CIDRs into per-node extras, and adds controller sweeps to delete stale and orphan peers across registry clusters. Tests (unit and integration) cover these behaviors. ChangesClusterMesh peer lifecycle and registry robustness
Sequence DiagramsequenceDiagram
participant Reconcile
participant PerCluster
participant StaleCleanup as cleanupStaleSourceClusters
participant OrphanCleanup as cleanupOrphanMeshPeers
participant Registry
participant APIServer as Peer objects
Reconcile->>PerCluster: reconcile each mesh.Spec.Clusters
PerCluster->>APIServer: create/update peers
Reconcile->>StaleCleanup: after per-cluster done
StaleCleanup->>Registry: list all known clusters
StaleCleanup->>APIServer: delete peers in removed clusters or with invalid source labels
Reconcile->>OrphanCleanup: after stale cleanup
OrphanCleanup->>APIServer: list all Peer objects with kilo-clustermesh.io/mesh label
OrphanCleanup->>APIServer: for each peer, check if mesh label references live ClusterMesh CR
OrphanCleanup->>APIServer: delete peers labeled to non-existent ClusterMesh CRs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
CI feedback on the first push: 1. funlen: cleanupOrphanMeshPeers was 65 lines (limit 60). Split into three: cleanupOrphanMeshPeers (orchestrator), collectLivingMeshes (CR-list helper), sweepOrphanPeersInCluster (per-cluster scan). Behaviour identical; just shorter functions. 2. TestCleanupStaleSourceClusters_IgnoresPeersFromOtherMeshes had been planting a peer labelled with a non-existent mesh and asserting the per-source sweep leaves it alone. That was true before this PR added the orphan-mesh sweep — which legitimately deletes peers whose mesh CR doesn't exist. The test now creates a sibling ClusterMesh in the same namespace so the foreign peer's mesh label references a LIVING CR. The property under test (per-source sweep doesn't touch peers owned by another mesh) is unchanged; orphan handling is covered by the dedicated TestCleanupOrphanMeshPeers_* cases. 3. TestCleanupOrphanMeshPeers_LeavesPeersOfLivingMeshAlone planted a peer in localClient without a t.Cleanup, so it survived the reconciling mesh's deletion and polluted subsequent integration tests whose reconciles legitimately swept it as an orphan. Added the missing Delete in t.Cleanup. Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/controller/clustermesh_controller.go`:
- Around line 208-246: collectLivingMeshes is listing ClusterMesh only in the
reconciler's namespace which causes peers owned by same-name meshes in other
namespaces to be treated as orphans; change the ownership check to be
namespace-qualified or use a cluster-wide live-mesh lookup. Concretely: modify
collectLivingMeshes to include namespace in the returned identity (e.g.,
map["<namespace>/<meshName>"]struct{} or accept a cluster-wide list by removing
client.InNamespace), update all callers (cleanupOrphanMeshPeers and
sweepOrphanPeersInCluster) to build and check owner keys using
mesh.Namespace+mesh.Name, and ensure the peer.Labels creation/inspection
(peer.Labels or wherever labels are generated) uses the namespace-qualified mesh
identity so deletes only remove peers whose namespace+name owner is truly absent
cluster-wide.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4b5af09e-1a1a-4edd-9704-c689981be8ca
📒 Files selected for processing (7)
cmd/main.gointernal/controller/clustermesh_controller.gointernal/multicluster/registry.gointernal/multicluster/registry_test.gointernal/peer/builder.gointernal/peer/builder_test.gotest/integration/cleanup_test.go
…r-wide CR list) Three concrete findings from gemini-code-assist and one critical from coderabbitai on PR cozystack#11. 1. peer.BuildPeer: guard against empty Spec.PodCIDRs Accessing PodCIDRs[0] would panic on a node that joined before the kube-controller-manager had allocated a pod CIDR. Surface this as a clean BuildPeer error so the reconciler's per-node validation path skips the node and continues, instead of crash-looping the operator. 2. cleanupStaleSourceClusters: per-target timeout Synchronous List/Delete calls across the merged registry could stall the whole reconcile loop if one remote cluster's apiserver is slow or unreachable. Wrap each target's sweep in a context.WithTimeout bounded by cleanupSweepTimeout (5s), so a single bad cluster degrades only its own per-tick progress. 3. cleanupOrphanMeshPeers: per-target timeout (same rationale). 4. cleanupOrphanMeshPeers: list ClusterMesh CLUSTER-WIDE Critical bug spotted by coderabbitai. collectLivingMeshes previously restricted the lookup to the reconciler's own namespace, but the operator's registry is built cluster-wide from every ClusterMesh it sees. A Peer labelled mesh=foo on a target cluster could legitimately belong to a foo CR in another namespace, and reconciling a CR in namespace A would treat that peer as orphaned and delete it. Drop the InNamespace filter and consider every ClusterMesh in the cluster as a living mesh. Namespace-qualified Peer ownership (a dedicated label) is a cleaner long-term fix and worth a follow-up, but the cluster-wide list is enough to stop the data-loss path described in the review. Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
internal/controller/clustermesh_controller.go (2)
550-570: 💤 Low valueConsider sorting nodes for deterministic anchor assignment.
The first node in the slice receives anchor CIDRs, but node ordering depends on API server list response which can vary. If the order changes between reconciles, different nodes become the "anchor", causing peer updates and brief routing churn.
Sorting nodes by name before iteration would make anchor assignment deterministic:
🔧 Proposed fix to sort nodes
+import "sort" func buildDesiredPeers(meshName string, entry *v1alpha1.ClusterEntry, nodes []*corev1.Node) ([]*kilov1alpha1.Peer, error) { peers := make([]*kilov1alpha1.Peer, 0, len(nodes)) + // Sort nodes by name for deterministic anchor assignment. The first + // node carries the cluster-wide CIDRs, so a stable order avoids + // peer churn when the API server returns nodes in a different order. + sort.Slice(nodes, func(i, j int) bool { + return nodes[i].Name < nodes[j].Name + }) + anchorExtras := peer.CollectAnchorCIDRs(entry)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/clustermesh_controller.go` around lines 550 - 570, buildDesiredPeers currently assigns anchor CIDRs to the first element of the nodes slice, but node list order from the API server is non-deterministic; to avoid flapping anchors sort the nodes slice by node.Name before the loop. Update buildDesiredPeers to perform a stable sort (e.g., sort.Slice) on the incoming nodes slice by each corev1.Node.Name, then keep the rest of the logic (use peer.CollectAnchorCIDRs(entry) and call peer.BuildPeer(meshName, entry, node, extras)) unchanged so anchorExtras are consistently applied to the same node across reconciles.
185-198: 💤 Low valueLint: missing blank line before
cancel().The wsl linter flags a missing blank line. Given the tight coupling between the
sweepCtxcreation and its immediate use/cleanup, consider either adding a blank line or using a helper to satisfy the linter without cluttering the loop.🧹 Proposed fix to add blank line
sweepCtx, cancel := context.WithTimeout(ctx, cleanupSweepTimeout) err := peer.DeleteStaleSourceClusters(sweepCtx, tgtClient, mesh.Name, sources) + cancel()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/clustermesh_controller.go` around lines 185 - 198, The linter flags a missing blank line before the cancel() call; update the loop around the WithTimeout usage (symbols: sweepCtx, cancel, and peer.DeleteStaleSourceClusters) to either insert a blank line between the "err := peer.DeleteStaleSourceClusters(...)" statement and the "cancel()" call, or refactor into a small helper that uses defer cancel() immediately after context.WithTimeout to satisfy the linter without changing behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/controller/clustermesh_controller.go`:
- Around line 550-570: buildDesiredPeers currently assigns anchor CIDRs to the
first element of the nodes slice, but node list order from the API server is
non-deterministic; to avoid flapping anchors sort the nodes slice by node.Name
before the loop. Update buildDesiredPeers to perform a stable sort (e.g.,
sort.Slice) on the incoming nodes slice by each corev1.Node.Name, then keep the
rest of the logic (use peer.CollectAnchorCIDRs(entry) and call
peer.BuildPeer(meshName, entry, node, extras)) unchanged so anchorExtras are
consistently applied to the same node across reconciles.
- Around line 185-198: The linter flags a missing blank line before the cancel()
call; update the loop around the WithTimeout usage (symbols: sweepCtx, cancel,
and peer.DeleteStaleSourceClusters) to either insert a blank line between the
"err := peer.DeleteStaleSourceClusters(...)" statement and the "cancel()" call,
or refactor into a small helper that uses defer cancel() immediately after
context.WithTimeout to satisfy the linter without changing behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 52629c81-5449-405e-b0c4-18256975e427
📒 Files selected for processing (2)
internal/controller/clustermesh_controller.gointernal/peer/builder.go
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Summary
Five related fixes that together let `kilo-clustermesh-operator` handle a real-world hub-and-spoke topology end-to-end, validated live on dev3 with three switchcloud tenants peering with a shared external Ceph hub.
1. multicluster.Build: tolerate missing kubeconfig secrets
`Build` used to bail on the first cluster entry whose kubeconfig Secret could not be read. That crashed the operator at startup during tenant teardown — the upstream Helm release deletes the admin-kubeconfig Secret before the `ClusterMesh` finalizer reconcile runs, so the leftover `KubeconfigSecretRef` made `buildInitialRegistry` fail. The operator then crashloop'd, the finalizer never ran, and `ClusterMesh` objects wedged with `finalizers:[kilo-clustermesh.io/cleanup]` forever.
`Build` now logs a warning per problematic entry and skips it, returning the rest of the registry as usable. The reconciler already does best-effort sweeps via `r.Registry.Client(name)` with `if !ok { continue }`, so a partial registry preserves forward progress: finalizers clean peers in clusters they can still reach, the `ClusterMesh` releases, and the operator can come back up clean.
2. cleanupStaleSourceClusters: sweep peers when a cluster leaves spec
`ReconcilePeers` only sweeps orphans inside one `(mesh, source-cluster)` pair. Once a cluster entry is removed from `spec.Clusters`, no per-pair sweep ever runs for it again and its `Peer` objects in surviving clusters persist — with stale endpoints, conflicting AllowedIPs, and unreachable public keys that Kilo still picks up.
New `DeleteStaleSourceClusters` lists every `Peer` labelled with this mesh and deletes those whose `source-cluster` label is no longer in spec. Wired into `Reconcile` after `reconcileAllClusters`. Cross-CR coverage: the sweep also walks clusters present only in OTHER `ClusterMesh` resources' specs (via the merged registry), and clusters fully removed from THIS mesh's spec get every peer wiped (not just stale-source ones).
3. cleanupOrphanMeshPeers: sweep peers of deleted ClusterMesh CRs
`handleDeletion` is the primary cleanup path, but the finalizer can be skipped — force-delete, manual finalizer removal (to unblock a crashloop'd operator), legacy CR predating the finalizer. Without this sweep, the cluster accumulates ghost peers that no per-CR reconcile would ever notice — none of the per-CR cleanup paths look at peers labelled for CRs other than their own.
Every live reconcile of any `ClusterMesh` now also collects the set of living mesh names in the namespace and deletes any `Peer` whose `mesh` label points outside that set. Failures log per peer and continue.
4. peer.BuildPeer: fold anchor CIDRs into the first node Peer
WireGuard identifies peers exclusively by their public key and keeps only one peer entry per pubkey. The operator had been emitting two `Peer` objects per source-cluster anchor — one node Peer with `allowedIPs=[podCIDR, wgIP/32]` and one separate "anchor" Peer with `allowedIPs=[serviceCIDR, additionalCIDRs]` — both reusing the anchor node's pubkey. On the receiving cluster, `kilo` applied whichever `Peer` it processed last; the other entry's AllowedIPs were silently dropped from kernel WG state. The result was a racy half-outage: sometimes the receiving cluster could route the pod-CIDR but lost the service-CIDR, sometimes the inverse. We observed this live — mesh1 happened to land in the favourable order; mesh2's handshake completed (visible in tcpdump) but inner packets were dropped at ingress because their source IP was outside the surviving WG peer's AllowedIPs.
Stop emitting the separate anchor `Peer`. `BuildPeer` gains an `extraAllowedIPs []string` parameter; `buildDesiredPeers` passes `CollectAnchorCIDRs(entry)` to the first valid node and `nil` to the rest. One WG peer entry per pubkey with the full union of AllowedIPs.
5. Tests
Unit:
Integration (`test/integration/cleanup_test.go`):
Live validation (dev3)
Test plan
Summary by CodeRabbit
New Features
Bug Fixes
Tests