Skip to content

fix(peer): host-netns reachability and NAT bootstrap reliability#25

Merged
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
fix/host-netns-and-bootstrap
Jun 20, 2026
Merged

fix(peer): host-netns reachability and NAT bootstrap reliability#25
Andrei Kvapil (kvaps) merged 1 commit into
mainfrom
fix/host-netns-and-bootstrap

Conversation

@kvaps

@kvaps Andrei Kvapil (kvaps) commented Jun 20, 2026

Copy link
Copy Markdown
Member

Three fixes so host-network workloads (the CephFS CSI nodeplugin, hostNetwork=true) in NAT'd tenant clusters can reach the Ceph mons reliably.

A. Advertise each node's own InternalIP as a /32 host route unconditionally. Previously gated on AllowedNetworks; a host-networked pod sends with src = the node's own InternalIP, so that /32 must be in the node's own Peer AllowedIPs or WireGuard crypto-routing drops it (rados -110). It can't go via allowedNetworks (would trip CIDR-overlap detection when a node subnet is a subset of another mesh's podCIDR), so it's added directly to the peer.

B. Mesh-wide PersistentKeepalive floor. The return-side peer is built from the tenant's self entry (PK=0), so the Ceph→tenant direction had no keepalive and the NAT mapping expired (tunnel flaps). Now every peer in a mesh gets max(entry.PK, max PK across the mesh), so declaring keepalive on any cluster keeps both directions alive.

C. Roaming peers for endpoint-less nodes. A freshly-created NAT'd node has no clustermesh-/force-endpoint and no ExternalIP, so it was skipped → no peer → no traffic → discovered-endpoint never populates (deadlock). Such a node is no longer skipped; it gets a peer with a nil endpoint (roaming), the tenant initiates outbound to Ceph's public endpoint, and the far side learns the address from the first handshake. Malformed endpoints still error.

Tests added/updated for all three; build/vet/test/golangci-lint clean.

Summary by CodeRabbit

  • New Features

    • Mesh now enforces a minimum persistent keepalive floor across all peers to ensure stable connectivity.
    • Nodes without fixed WireGuard endpoints can now participate as "roaming peers" in the mesh.
  • Bug Fixes

    • Improved node endpoint validation and resolution handling.
    • Fixed duplicate IP address deduplication in peer configurations.

Three production bugs prevented host-network workloads (the CephFS CSI
nodeplugin, hostNetwork=true) from reaching the Ceph mons and stopped
NAT'd tenant nodes from bootstrapping reliably.

Fix A — always advertise each node's own InternalIP as a /32 host route.
A host-networked pod sends with src = the node's own InternalIP; that /32
must be in the peer's AllowedIPs or WireGuard crypto-routing on the far
side drops the packet (rados ret=-110 mount timeout). The /32 now goes
directly into the node's own Peer, independent of allowedNetworks, so it
cannot trip the cross-mesh CIDR-overlap detector. AllowedIPs are deduped.

Fix B — persistentKeepalive on both sides of a NAT mesh. The ceph-side
peer for a tenant node is built from the tenant's SELF entry (keepalive
0), so the Ceph->tenant direction had no keepalive and the NAT mapping
expired, flapping the tunnel. A mesh-wide keepalive (max across all mesh
entries) is now threaded into BuildPeer as a floor, so every peer in a
NAT mesh keeps its mapping refreshed in both directions.

Fix C — bootstrap NAT nodes with no resolvable endpoint as roaming peers.
A freshly-created NAT'd node has no endpoint until traffic flows, but
skipping it deadlocked bootstrap (no peer -> no traffic -> discovered
endpoint never populates). A missing endpoint now yields a Peer with
Endpoint=nil (WireGuard roaming) instead of skipping the node; a
malformed endpoint annotation still skips. ValidateNode no longer treats
a missing endpoint as a skip reason.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f19858fe-ca23-4931-999e-be726a09df3c

📥 Commits

Reviewing files that changed from the base of the PR and between 95c698e and 13b013a.

📒 Files selected for processing (6)
  • internal/controller/clustermesh_controller.go
  • internal/controller/keepalive_test.go
  • internal/peer/builder.go
  • internal/peer/builder_test.go
  • internal/validation/node.go
  • internal/validation/node_test.go

📝 Walkthrough

Walkthrough

Adds a mesh-wide PersistentKeepalive floor computed as the maximum across all ClusterEntry values, threads it through BuildPeer via buildDesiredPeers and a new meshPersistentKeepalive helper. Simultaneously changes endpoint-less nodes from errors to valid roaming peers (Endpoint=nil) in both validateEndpoint and resolvePeerEndpoint, and makes NodeInternalIP host routes always advertised (unconditionally) with /32 deduplication.

Changes

Mesh-wide keepalive floor and roaming peer support

Layer / File(s) Summary
Roaming peer: endpoint-less nodes no longer errored
internal/validation/node.go, internal/validation/node_test.go, internal/peer/builder.go
validateEndpoint removes the ReasonNoEndpoint skip path; resolvePeerEndpoint returns (nil, nil) for missing endpoints instead of an error. Tests updated to assert wantSkipped: false for endpoint-less nodes.
BuildPeer core: meshKeepalive, unconditional InternalIP routes, dedup
internal/peer/builder.go
BuildPeer gains meshKeepalive int; PersistentKeepalive is set to max(entry, mesh); AllowedIPs unconditionally includes nodeOwnInternalIPHostRoutes then extraAllowedIPs, deduplicated via a new dedupeStrings helper.
Controller: meshPersistentKeepalive helper and threading
internal/controller/clustermesh_controller.go
New meshPersistentKeepalive iterates all ClusterEntry items and returns the max keepalive; pushPeersToTargets passes it into buildDesiredPeers, which forwards it to peer.BuildPeer; comment on transient-skip requeue updated.
BuildPeer tests: signature updates and new behavioral assertions
internal/peer/builder_test.go
All existing BuildPeer callsites updated with trailing meshKeepalive arg; new tests cover roaming peer, anchor roaming, InternalIP /32 deduplication, mesh keepalive floor, and entry keepalive override.
Controller keepalive tests
internal/controller/keepalive_test.go
New test file validates meshPersistentKeepalive max logic (including all-zero case) and that the Ceph-side peer inherits the mesh keepalive floor when the entry's own keepalive is 0.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • cozystack/kilo-clustermesh-operator#11: Modifies BuildPeer signature and callsites in clustermesh_controller.go around AllowedIPs and anchor extras handling — directly adjacent to this PR's changes.
  • cozystack/kilo-clustermesh-operator#20: Introduced ClusterEntry.PersistentKeepalive and its initial wiring into BuildPeer, which this PR extends with the mesh-wide floor computation.

Suggested reviewers

  • Arsolitt

Poem

🐇 Hop, hop — no endpoint? No fright!
A roaming peer wanders the night.
The keepalive floor lifts every peer,
with max-of-all values crystal clear.
InternalIP /32s dedupe with glee —
the mesh stays alive, thanks to me! 🌐

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/host-netns-and-bootstrap

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kvaps Andrei Kvapil (kvaps) merged commit 1469e48 into main Jun 20, 2026
7 of 8 checks passed
@kvaps Andrei Kvapil (kvaps) deleted the fix/host-netns-and-bootstrap branch June 20, 2026 12:30

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves cluster mesh peering by introducing a mesh-wide PersistentKeepalive floor to prevent tunnel flapping, allowing nodes without resolvable endpoints to be peered as roaming peers to avoid bootstrap deadlocks, and unconditionally advertising node InternalIPs as /32 host routes. The reviewer feedback recommends removing the now-unused ReasonNoEndpoint constant and its IsTransient case to clean up dead code. Additionally, the reviewer warns of potential silent IP conflicts from unconditionally advertising InternalIPs in environments with overlapping node subnets, suggesting either documenting this risk or implementing an overlap check.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines 200 to 208
func validateEndpoint(node *corev1.Node, entry *v1alpha1.ClusterEntry) (bool, NodeSkipReason, string) {
_, found, err := kilonode.ResolveEndpoint(node, entry.WireguardPort)
_, _, err := kilonode.ResolveEndpoint(node, entry.WireguardPort)
if err != nil {
return true, ReasonEndpointInvalid, fmt.Sprintf(
"node %q has an invalid endpoint annotation: %v", node.Name, err,
)
}

if !found {
return true, ReasonNoEndpoint, fmt.Sprintf(
"node %q has no resolvable endpoint (no clustermesh-endpoint, force-endpoint, or ExternalIP)",
node.Name,
)
}

return false, "", ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since ReasonNoEndpoint is no longer returned by validateEndpoint (as roaming peers with no endpoint are now allowed), the constant ReasonNoEndpoint and its corresponding case in IsTransient have become dead code. Consider removing them to keep the codebase clean.

Comment thread internal/peer/builder.go
Comment on lines +103 to +105
allowedIPs = append(allowedIPs, nodeOwnInternalIPHostRoutes(node)...)
allowedIPs = append(allowedIPs, extraAllowedIPs...)
allowedIPs = dedupeStrings(allowedIPs)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Unconditionally advertising the node's InternalIPs as /32 host routes without gating them on AllowedNetworks can lead to silent IP conflicts in WireGuard if different tenant clusters have overlapping node subnets (which is very common in multi-tenant environments). Since these InternalIPs are not part of AllowedNetworks, the mesh-wide overlap detector (validateMeshNetworks) will not catch them, and WireGuard will silently clobber the AllowedIPs of one of the conflicting peers. Consider adding a warning in the documentation or implementing an overlap check that also covers these unconditionally advertised node InternalIPs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant