fix(peer): host-netns reachability and NAT bootstrap reliability#25
Conversation
Three production bugs prevented host-network workloads (the CephFS CSI nodeplugin, hostNetwork=true) from reaching the Ceph mons and stopped NAT'd tenant nodes from bootstrapping reliably. Fix A — always advertise each node's own InternalIP as a /32 host route. A host-networked pod sends with src = the node's own InternalIP; that /32 must be in the peer's AllowedIPs or WireGuard crypto-routing on the far side drops the packet (rados ret=-110 mount timeout). The /32 now goes directly into the node's own Peer, independent of allowedNetworks, so it cannot trip the cross-mesh CIDR-overlap detector. AllowedIPs are deduped. Fix B — persistentKeepalive on both sides of a NAT mesh. The ceph-side peer for a tenant node is built from the tenant's SELF entry (keepalive 0), so the Ceph->tenant direction had no keepalive and the NAT mapping expired, flapping the tunnel. A mesh-wide keepalive (max across all mesh entries) is now threaded into BuildPeer as a floor, so every peer in a NAT mesh keeps its mapping refreshed in both directions. Fix C — bootstrap NAT nodes with no resolvable endpoint as roaming peers. A freshly-created NAT'd node has no endpoint until traffic flows, but skipping it deadlocked bootstrap (no peer -> no traffic -> discovered endpoint never populates). A missing endpoint now yields a Peer with Endpoint=nil (WireGuard roaming) instead of skipping the node; a malformed endpoint annotation still skips. ValidateNode no longer treats a missing endpoint as a skip reason. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughAdds a mesh-wide ChangesMesh-wide keepalive floor and roaming peer support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request improves cluster mesh peering by introducing a mesh-wide PersistentKeepalive floor to prevent tunnel flapping, allowing nodes without resolvable endpoints to be peered as roaming peers to avoid bootstrap deadlocks, and unconditionally advertising node InternalIPs as /32 host routes. The reviewer feedback recommends removing the now-unused ReasonNoEndpoint constant and its IsTransient case to clean up dead code. Additionally, the reviewer warns of potential silent IP conflicts from unconditionally advertising InternalIPs in environments with overlapping node subnets, suggesting either documenting this risk or implementing an overlap check.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func validateEndpoint(node *corev1.Node, entry *v1alpha1.ClusterEntry) (bool, NodeSkipReason, string) { | ||
| _, found, err := kilonode.ResolveEndpoint(node, entry.WireguardPort) | ||
| _, _, err := kilonode.ResolveEndpoint(node, entry.WireguardPort) | ||
| if err != nil { | ||
| return true, ReasonEndpointInvalid, fmt.Sprintf( | ||
| "node %q has an invalid endpoint annotation: %v", node.Name, err, | ||
| ) | ||
| } | ||
|
|
||
| if !found { | ||
| return true, ReasonNoEndpoint, fmt.Sprintf( | ||
| "node %q has no resolvable endpoint (no clustermesh-endpoint, force-endpoint, or ExternalIP)", | ||
| node.Name, | ||
| ) | ||
| } | ||
|
|
||
| return false, "", "" |
There was a problem hiding this comment.
| allowedIPs = append(allowedIPs, nodeOwnInternalIPHostRoutes(node)...) | ||
| allowedIPs = append(allowedIPs, extraAllowedIPs...) | ||
| allowedIPs = dedupeStrings(allowedIPs) |
There was a problem hiding this comment.
Unconditionally advertising the node's InternalIPs as /32 host routes without gating them on AllowedNetworks can lead to silent IP conflicts in WireGuard if different tenant clusters have overlapping node subnets (which is very common in multi-tenant environments). Since these InternalIPs are not part of AllowedNetworks, the mesh-wide overlap detector (validateMeshNetworks) will not catch them, and WireGuard will silently clobber the AllowedIPs of one of the conflicting peers. Consider adding a warning in the documentation or implementing an overlap check that also covers these unconditionally advertised node InternalIPs.
Three fixes so host-network workloads (the CephFS CSI nodeplugin, hostNetwork=true) in NAT'd tenant clusters can reach the Ceph mons reliably.
A. Advertise each node's own InternalIP as a /32 host route unconditionally. Previously gated on
AllowedNetworks; a host-networked pod sends with src = the node's own InternalIP, so that /32 must be in the node's own Peer AllowedIPs or WireGuard crypto-routing drops it (rados -110). It can't go viaallowedNetworks(would trip CIDR-overlap detection when a node subnet is a subset of another mesh's podCIDR), so it's added directly to the peer.B. Mesh-wide PersistentKeepalive floor. The return-side peer is built from the tenant's self entry (PK=0), so the Ceph→tenant direction had no keepalive and the NAT mapping expired (tunnel flaps). Now every peer in a mesh gets
max(entry.PK, max PK across the mesh), so declaring keepalive on any cluster keeps both directions alive.C. Roaming peers for endpoint-less nodes. A freshly-created NAT'd node has no clustermesh-/force-endpoint and no ExternalIP, so it was skipped → no peer → no traffic → discovered-endpoint never populates (deadlock). Such a node is no longer skipped; it gets a peer with a nil endpoint (roaming), the tenant initiates outbound to Ceph's public endpoint, and the far side learns the address from the first handshake. Malformed endpoints still error.
Tests added/updated for all three; build/vet/test/golangci-lint clean.
Summary by CodeRabbit
New Features
Bug Fixes