Skip to content

feat(clustermesh): add persistentKeepalive support for NAT traversal#151

Closed
IvanHunters wants to merge 2 commits into
mainfrom
feat/clustermesh-persistent-keepalive
Closed

feat(clustermesh): add persistentKeepalive support for NAT traversal#151
IvanHunters wants to merge 2 commits into
mainfrom
feat/clustermesh-persistent-keepalive

Conversation

@IvanHunters

Copy link
Copy Markdown
Contributor

Summary

Adds optional persistentKeepalive field to remoteClusters and localCluster entries in the ClusterMesh template.

When set to a non-zero value (e.g. 25), it is forwarded to every WireGuard Peer CRD the operator creates for nodes in that cluster. This keeps the stateful NAT mapping alive and enables bidirectional WireGuard connectivity for clusters behind NAT (e.g. OpenStack tenants without floating IPs).

Usage

addons:
  kilo:
    clusterMesh:
      remoteClusters:
        - name: nuvolos-workload
          podCIDRs: [...]
          persistentKeepalive: 25   # seconds; 0 = disabled (default)

Dependencies

Requires kilo-clustermesh-operator >= 0.6.0 (cozystack/kilo-clustermesh-operator#20) which:

  • Adds persistentKeepalive to ClusterEntry and propagates it to Peer CRDs
  • Adds discovered-endpoint enrichment so the operator auto-detects the real NAT egress IP from kilo.squat.ai/discovered-endpoints annotations

…ote clusters

Add optional persistentKeepalive field to remoteClusters and localCluster
entries in the ClusterMesh template. When set to a non-zero value (e.g. 25),
the operator forwards it to every WireGuard Peer CRD it creates for nodes in
that cluster, keeping the NAT mapping alive between reconciles.

Required for clusters behind NAT (e.g. OpenStack tenants without floating IPs):
the kilo-clustermesh-operator now uses kilo.squat.ai/discovered-endpoints to
auto-detect the real egress IP, but the bidirectional session must be kept
alive via PersistentKeepalive or the NAT state expires and connectivity drops.

Depends on kilo-clustermesh-operator >= 0.6.0 (PR #20) which adds the
persistentKeepalive field to ClusterEntry and the discovered-endpoint
enrichment.

Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>
Signed-off-by: IvanHunters <xorokhotnikov@gmail.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for configuring the WireGuard persistentKeepalive interval in the Kilo ClusterMesh configuration. The changes update the clustermesh.yaml template and the OpenAPI schema in cozyrds.yaml. The review feedback correctly identifies that the localCluster field (and its persistentKeepalive property) is missing from the OpenAPI schema in cozyrds.yaml despite being used in the templates, and suggests adding it to ensure proper validation and documentation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

plural: kubernetesswitchclouds
openAPISchema: |-
{"title":"Chart Values","type":"object","properties":{"openstack":{"description":"OpenStack connection settings.","type":"object","default":{},"properties":{"authURL":{"description":"OpenStack identity endpoint.","type":"string","default":""},"regionName":{"description":"OpenStack region name.","type":"string","default":""},"applicationCredentialID":{"description":"Application credential ID.","type":"string","default":""},"applicationCredentialSecret":{"description":"Application credential secret.","type":"string","default":""},"cloudName":{"description":"Cloud name used as key in clouds.yaml.","type":"string","default":"openstack"},"existingSecret":{"description":"Name of existing Secret with clouds.yaml (overrides inline credentials).","type":"string","default":""},"floatingIPNetwork":{"description":"External network for Floating IPs (leave empty to disable FIPs).","type":"string","default":""},"loadBalancer":{"description":"Centralised LoadBalancer feature. When enabled, the management-cluster loadbalancer-controller provisions Octavia LBs for tenant Services of type LoadBalancer using the OpenStack credentials configured above; tenant cluster users never see those credentials directly.","type":"object","default":{},"properties":{"enabled":{"description":"Enable centralised LoadBalancer provisioning for this tenant. Off by default; turn on only after the controller has been deployed to the management cluster.","type":"boolean","default":false},"providerDriver":{"description":"Octavia provider driver. In Switch Cloud zhw only `ovn` is functional (the `amphora` driver has no flavors). Leave empty to use the project default.","type":"string","default":"ovn"},"vipNetworkID":{"description":"REQUIRED when enabled=true. Neutron network ID where the LB VIP port is allocated. Must be a tenant-owned network reachable from the worker nodes (typically the same network as spec.openstack.network.id). In Switch Cloud zhw direct-on-public VIP is not BGP-announced for IPv4, so VIP must live on a tenant network and external access is wired via floatingNetworkID below.","type":"string","default":""},"floatingNetworkID":{"description":"Optional external network ID. When set, the controller allocates a floating IP from this network after the LB becomes ACTIVE and binds it to the LB VIP port; the FIP address is reported back in Service.status.loadBalancer.ingress. Leave empty for internal-only LBs (no external reachability).","type":"string","default":""},"floatingSubnetID":{"description":"Optional Neutron subnet ID within floatingNetworkID. Pins FIP allocation to a specific subnet when the external network has multiple subnets (e.g. dual-stack). Leave empty to let Neutron pick.","type":"string","default":""},"workerSecurityGroupID":{"description":"Optional Neutron security group ID attached to the worker-node ports. When set, the controller manages per-Service ingress rules in this SG (one rule per Service port \u00d7 allowedCIDRs). Required for external traffic to reach the NodePort behind the LB: Octavia's pool members carry the worker port's SG, so without an explicit allow rule the default SG (which only permits intra-SG traffic) drops inbound packets coming from the LB VIP. Leave empty if you prefer to manage SG rules manually outside the controller; operator-managed rules are then your responsibility.","type":"string","default":""},"allowedCIDRs":{"description":"CIDR allowlist for inbound traffic to the LB NodePort. Each Service port gets ingress rules in workerSecurityGroupID for every CIDR listed here. Defaults to 0.0.0.0/0 (publicly reachable); narrow this when only specific networks should reach the Service. Ignored when workerSecurityGroupID is unset.","type":"array","default":["0.0.0.0/0"],"items":{"type":"string"}}}},"network":{"description":"OpenStack network settings. When `id` is set the chart operates in legacy mode and CAPO consumes that pre-existing network. When `id` is empty CAPO auto-manages a per-cluster network/subnet/router on apply and tears them down on cluster delete, giving full L2/L3 isolation between clusters in the same project.","type":"object","default":{},"properties":{"id":{"description":"Pre-existing OpenStack network ID for worker nodes. Legacy mode. Leave empty to enable auto-managed mode below. IMMUTABLE once set: clearing this field would switch a live cluster from legacy to auto-managed and trigger CAPO to provision a brand-new network, orphaning the old one and rerolling all workers (hard outage). Delete and recreate the cluster CR if you really need to switch modes.","type":"string","default":"","x-kubernetes-validations":[{"rule":"self == oldSelf || oldSelf == ''","message":"spec.openstack.network.id is immutable once set: clearing it would switch the cluster from legacy to auto-managed mode and trigger a network re-provision (data loss). Delete and recreate the cluster CR instead."}]},"subnetCIDR":{"description":"IPv4 CIDR for the auto-managed worker subnet. Used only when `id` is empty. Default `10.244.0.0/24` is chosen to NOT overlap the in-cluster default podCIDR `10.243.0.0/16`; pick a non-overlapping range if you mesh multiple clusters via Kilo.","type":"string","default":"10.244.0.0/24"},"externalNetworkID":{"description":"Neutron network ID used as the router's external gateway in auto-managed mode. Leave empty to let CAPO auto-discover the single project-visible external network (Switch Cloud zhw: the `public` network).","type":"string","default":""},"dnsNameservers":{"description":"DNS servers wired into the auto-managed subnet's DHCP. Used only when `id` is empty. Empty list = OpenStack default.","type":"array","default":[],"items":{"type":"string"}}}}}},"version":{"description":"Kubernetes version string (e.g. v1.32.6).","type":"string","default":"v1.32.6"},"talosVersion":{"description":"Talos OS version embedded in the node image.","type":"string","default":"v1.10.0"},"controlPlane":{"description":"Kamaji control-plane configuration.","type":"object","default":{},"properties":{"replicas":{"description":"Number of control-plane replicas.","type":"integer","default":2}}},"host":{"description":"External hostname for the Kubernetes API. Defaults to <name>.<namespace>.svc:6443 when empty.","type":"string","default":""},"ingressClassName":{"description":"IngressClass for the Kamaji-managed apiserver Ingress. When empty (default), falls back to `_namespace.ingress` (= the parent tenant namespace name) so per-tenant Ingress controllers like `tenant-root` keep working unchanged. Set to e.g. `nginx` to route the apiserver through a host-level ingress controller instead (saves provisioning a dedicated tenant-root ingress-nginx for clusters that already expose one at the platform level; the controller must have --enable-ssl-passthrough since the Kamaji apiserver terminates TLS itself).","type":"string","default":""},"nodeGroups":{"description":"Worker node group definitions.","type":"object","default":{"md0":{"minReplicas":0,"maxReplicas":10,"flavorName":"c004r008","imageName":"talos-openstack-amd64","roles":[],"resources":{"cpu":4,"memory":"8Gi"},"kubelet":{"evictionHardMemory":"7%","evictionSoftMemory":"10%"},"securityGroups":["default"]}},"additionalProperties":{"type":"object","required":["flavorName","imageName","maxReplicas","minReplicas"],"properties":{"minReplicas":{"description":"Minimum autoscaler replicas.","type":"integer","default":0},"maxReplicas":{"description":"Maximum autoscaler replicas.","type":"integer","default":10},"flavorName":{"description":"OpenStack Nova flavor name.","type":"string","default":"c004r008"},"imageName":{"description":"Glance image name for Talos nodes.","type":"string","default":"talos-openstack-amd64"},"roles":{"description":"Node roles propagated as labels.","type":"array","items":{"type":"string"}},"resources":{"description":"CPU/memory for kubelet reservations and autoscaler capacity hints.","type":"object","properties":{"cpu":{"anyOf":[{"type":"integer"},{"type":"string"}],"x-kubernetes-int-or-string":true},"memory":{"anyOf":[{"type":"integer"},{"type":"string"}],"x-kubernetes-int-or-string":true}}},"kubelet":{"description":"Kubelet eviction threshold overrides.","type":"object","properties":{"evictionHardMemory":{"type":"string","default":"7%"},"evictionSoftMemory":{"type":"string","default":"10%"}}},"securityGroups":{"description":"OpenStack security groups.","type":"array","items":{"type":"string"}}}}},"addons":{"description":"Cluster addons configuration.","type":"object","default":{},"properties":{"certManager":{"type":"object","default":{},"properties":{"enabled":{"type":"boolean","default":true},"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true}}},"cilium":{"type":"object","default":{},"properties":{"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true}}},"ingressNginx":{"type":"object","default":{},"properties":{"enabled":{"type":"boolean","default":false,"description":"Enable Ingress-NGINX controller in the tenant cluster."},"exposeMethod":{"type":"string","enum":["LoadBalancer","Proxied"],"default":"LoadBalancer","description":"How the controller is reachable from outside. Proxied is reserved for a future platform-side router and currently fails render in switchcloud."},"hosts":{"type":"array","default":[],"items":{"type":"string"},"description":"Domains served by this tenant ingress. Used for cert-manager SANs and to document the public routing target."},"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true}}},"metricsServer":{"type":"object","default":{},"properties":{"enabled":{"type":"boolean","default":true},"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true}}},"fluxcd":{"type":"object","default":{},"properties":{"enabled":{"type":"boolean","default":false},"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true}}},"kilo":{"type":"object","default":{},"properties":{"enabled":{"type":"boolean","default":false,"description":"Install Kilo WireGuard mesh in the tenant cluster. Required for cross-location routing or kilo-clustermesh-operator participation."},"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true},"clusterMesh":{"type":"object","default":{},"description":"When remoteClusters is non-empty, emits a ClusterMesh CR for kilo-clustermesh-operator. The local cluster entry is auto-derived; for each remote tenant supply its name and the Kamaji admin-kubeconfig Secret name.","properties":{"localWireguardCIDR":{"type":"string","default":"","description":"WireGuard CIDR for the local cluster; must match the kilo.transitCIDR you set under addons.kilo.valuesOverride."},"wireguardPort":{"type":"integer","default":51820,"description":"UDP port for WireGuard peers."},"remoteClusters":{"type":"array","default":[],"description":"Remote clusters to mesh with. Empty list disables ClusterMesh emission.","items":{"type":"object","required":["name","kubeconfigSecretName","podCIDRs","serviceCIDR","wireguardCIDR"],"properties":{"name":{"type":"string"},"kubeconfigSecretName":{"type":"string","description":"Name of the Secret in tenant-root holding the remote cluster admin kubeconfig (Kamaji default: <release>-admin-kubeconfig)."},"kubeconfigSecretKey":{"type":"string","default":"super-admin.conf"},"podCIDRs":{"type":"array","items":{"type":"string"}},"additionalCIDRs":{"type":"array","default":[],"description":"Extra CIDRs folded into the AllowedIPs of the remote cluster leader peer. Use this to cover pod subnets of non-leader nodes that lack a WireGuard IP (e.g. location-granularity clusters with multiple nodes).","items":{"type":"string"}},"serviceCIDR":{"type":"string"},"wireguardCIDR":{"type":"string"},"wireguardPort":{"type":"integer","default":51820}}}}}}}}}},"network":{"type":"object","default":{},"description":"Cluster network CIDRs. Immutable after cluster creation.","properties":{"podCIDR":{"type":"string","default":"10.243.0.0/16","description":"CIDR allocated for pods. Maps onto CAPI Cluster.spec.clusterNetwork.pods.cidrBlocks."},"serviceCIDR":{"type":"string","default":"10.95.0.0/16","description":"CIDR allocated for Services. Maps onto CAPI Cluster.spec.clusterNetwork.services.cidrBlocks."},"dnsClusterIP":{"type":"string","default":"10.95.0.10","description":"ClusterIP for the CoreDNS Service. Must fall inside serviceCIDR."}}}}}
{"title":"Chart Values","type":"object","properties":{"openstack":{"description":"OpenStack connection settings.","type":"object","default":{},"properties":{"authURL":{"description":"OpenStack identity endpoint.","type":"string","default":""},"regionName":{"description":"OpenStack region name.","type":"string","default":""},"applicationCredentialID":{"description":"Application credential ID.","type":"string","default":""},"applicationCredentialSecret":{"description":"Application credential secret.","type":"string","default":""},"cloudName":{"description":"Cloud name used as key in clouds.yaml.","type":"string","default":"openstack"},"existingSecret":{"description":"Name of existing Secret with clouds.yaml (overrides inline credentials).","type":"string","default":""},"floatingIPNetwork":{"description":"External network for Floating IPs (leave empty to disable FIPs).","type":"string","default":""},"loadBalancer":{"description":"Centralised LoadBalancer feature. When enabled, the management-cluster loadbalancer-controller provisions Octavia LBs for tenant Services of type LoadBalancer using the OpenStack credentials configured above; tenant cluster users never see those credentials directly.","type":"object","default":{},"properties":{"enabled":{"description":"Enable centralised LoadBalancer provisioning for this tenant. Off by default; turn on only after the controller has been deployed to the management cluster.","type":"boolean","default":false},"providerDriver":{"description":"Octavia provider driver. In Switch Cloud zhw only `ovn` is functional (the `amphora` driver has no flavors). Leave empty to use the project default.","type":"string","default":"ovn"},"vipNetworkID":{"description":"REQUIRED when enabled=true. Neutron network ID where the LB VIP port is allocated. Must be a tenant-owned network reachable from the worker nodes (typically the same network as spec.openstack.network.id). In Switch Cloud zhw direct-on-public VIP is not BGP-announced for IPv4, so VIP must live on a tenant network and external access is wired via floatingNetworkID below.","type":"string","default":""},"floatingNetworkID":{"description":"Optional external network ID. When set, the controller allocates a floating IP from this network after the LB becomes ACTIVE and binds it to the LB VIP port; the FIP address is reported back in Service.status.loadBalancer.ingress. Leave empty for internal-only LBs (no external reachability).","type":"string","default":""},"floatingSubnetID":{"description":"Optional Neutron subnet ID within floatingNetworkID. Pins FIP allocation to a specific subnet when the external network has multiple subnets (e.g. dual-stack). Leave empty to let Neutron pick.","type":"string","default":""},"workerSecurityGroupID":{"description":"Optional Neutron security group ID attached to the worker-node ports. When set, the controller manages per-Service ingress rules in this SG (one rule per Service port \u00d7 allowedCIDRs). Required for external traffic to reach the NodePort behind the LB: Octavia's pool members carry the worker port's SG, so without an explicit allow rule the default SG (which only permits intra-SG traffic) drops inbound packets coming from the LB VIP. Leave empty if you prefer to manage SG rules manually outside the controller; operator-managed rules are then your responsibility.","type":"string","default":""},"allowedCIDRs":{"description":"CIDR allowlist for inbound traffic to the LB NodePort. Each Service port gets ingress rules in workerSecurityGroupID for every CIDR listed here. Defaults to 0.0.0.0/0 (publicly reachable); narrow this when only specific networks should reach the Service. Ignored when workerSecurityGroupID is unset.","type":"array","default":["0.0.0.0/0"],"items":{"type":"string"}}}},"network":{"description":"OpenStack network settings. When `id` is set the chart operates in legacy mode and CAPO consumes that pre-existing network. When `id` is empty CAPO auto-manages a per-cluster network/subnet/router on apply and tears them down on cluster delete, giving full L2/L3 isolation between clusters in the same project.","type":"object","default":{},"properties":{"id":{"description":"Pre-existing OpenStack network ID for worker nodes. Legacy mode. Leave empty to enable auto-managed mode below. IMMUTABLE once set: clearing this field would switch a live cluster from legacy to auto-managed and trigger CAPO to provision a brand-new network, orphaning the old one and rerolling all workers (hard outage). Delete and recreate the cluster CR if you really need to switch modes.","type":"string","default":"","x-kubernetes-validations":[{"rule":"self == oldSelf || oldSelf == ''","message":"spec.openstack.network.id is immutable once set: clearing it would switch the cluster from legacy to auto-managed mode and trigger a network re-provision (data loss). Delete and recreate the cluster CR instead."}]},"subnetCIDR":{"description":"IPv4 CIDR for the auto-managed worker subnet. Used only when `id` is empty. Default `10.244.0.0/24` is chosen to NOT overlap the in-cluster default podCIDR `10.243.0.0/16`; pick a non-overlapping range if you mesh multiple clusters via Kilo.","type":"string","default":"10.244.0.0/24"},"externalNetworkID":{"description":"Neutron network ID used as the router's external gateway in auto-managed mode. Leave empty to let CAPO auto-discover the single project-visible external network (Switch Cloud zhw: the `public` network).","type":"string","default":""},"dnsNameservers":{"description":"DNS servers wired into the auto-managed subnet's DHCP. Used only when `id` is empty. Empty list = OpenStack default.","type":"array","default":[],"items":{"type":"string"}}}}}},"version":{"description":"Kubernetes version string (e.g. v1.32.6).","type":"string","default":"v1.32.6"},"talosVersion":{"description":"Talos OS version embedded in the node image.","type":"string","default":"v1.10.0"},"controlPlane":{"description":"Kamaji control-plane configuration.","type":"object","default":{},"properties":{"replicas":{"description":"Number of control-plane replicas.","type":"integer","default":2}}},"host":{"description":"External hostname for the Kubernetes API. Defaults to <name>.<namespace>.svc:6443 when empty.","type":"string","default":""},"ingressClassName":{"description":"IngressClass for the Kamaji-managed apiserver Ingress. When empty (default), falls back to `_namespace.ingress` (= the parent tenant namespace name) so per-tenant Ingress controllers like `tenant-root` keep working unchanged. Set to e.g. `nginx` to route the apiserver through a host-level ingress controller instead (saves provisioning a dedicated tenant-root ingress-nginx for clusters that already expose one at the platform level; the controller must have --enable-ssl-passthrough since the Kamaji apiserver terminates TLS itself).","type":"string","default":""},"nodeGroups":{"description":"Worker node group definitions.","type":"object","default":{"md0":{"minReplicas":0,"maxReplicas":10,"flavorName":"c004r008","imageName":"talos-openstack-amd64","roles":[],"resources":{"cpu":4,"memory":"8Gi"},"kubelet":{"evictionHardMemory":"7%","evictionSoftMemory":"10%"},"securityGroups":["default"]}},"additionalProperties":{"type":"object","required":["flavorName","imageName","maxReplicas","minReplicas"],"properties":{"minReplicas":{"description":"Minimum autoscaler replicas.","type":"integer","default":0},"maxReplicas":{"description":"Maximum autoscaler replicas.","type":"integer","default":10},"flavorName":{"description":"OpenStack Nova flavor name.","type":"string","default":"c004r008"},"imageName":{"description":"Glance image name for Talos nodes.","type":"string","default":"talos-openstack-amd64"},"roles":{"description":"Node roles propagated as labels.","type":"array","items":{"type":"string"}},"resources":{"description":"CPU/memory for kubelet reservations and autoscaler capacity hints.","type":"object","properties":{"cpu":{"anyOf":[{"type":"integer"},{"type":"string"}],"x-kubernetes-int-or-string":true},"memory":{"anyOf":[{"type":"integer"},{"type":"string"}],"x-kubernetes-int-or-string":true}}},"kubelet":{"description":"Kubelet eviction threshold overrides.","type":"object","properties":{"evictionHardMemory":{"type":"string","default":"7%"},"evictionSoftMemory":{"type":"string","default":"10%"}}},"securityGroups":{"description":"OpenStack security groups.","type":"array","items":{"type":"string"}}}}},"addons":{"description":"Cluster addons configuration.","type":"object","default":{},"properties":{"certManager":{"type":"object","default":{},"properties":{"enabled":{"type":"boolean","default":true},"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true}}},"cilium":{"type":"object","default":{},"properties":{"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true}}},"ingressNginx":{"type":"object","default":{},"properties":{"enabled":{"type":"boolean","default":false,"description":"Enable Ingress-NGINX controller in the tenant cluster."},"exposeMethod":{"type":"string","enum":["LoadBalancer","Proxied"],"default":"LoadBalancer","description":"How the controller is reachable from outside. Proxied is reserved for a future platform-side router and currently fails render in switchcloud."},"hosts":{"type":"array","default":[],"items":{"type":"string"},"description":"Domains served by this tenant ingress. Used for cert-manager SANs and to document the public routing target."},"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true}}},"metricsServer":{"type":"object","default":{},"properties":{"enabled":{"type":"boolean","default":true},"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true}}},"fluxcd":{"type":"object","default":{},"properties":{"enabled":{"type":"boolean","default":false},"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true}}},"kilo":{"type":"object","default":{},"properties":{"enabled":{"type":"boolean","default":false,"description":"Install Kilo WireGuard mesh in the tenant cluster. Required for cross-location routing or kilo-clustermesh-operator participation."},"valuesOverride":{"type":"object","default":{},"x-kubernetes-preserve-unknown-fields":true},"clusterMesh":{"type":"object","default":{},"description":"When remoteClusters is non-empty, emits a ClusterMesh CR for kilo-clustermesh-operator. The local cluster entry is auto-derived; for each remote tenant supply its name and the Kamaji admin-kubeconfig Secret name.","properties":{"localWireguardCIDR":{"type":"string","default":"","description":"WireGuard CIDR for the local cluster; must match the kilo.transitCIDR you set under addons.kilo.valuesOverride."},"wireguardPort":{"type":"integer","default":51820,"description":"UDP port for WireGuard peers."},"remoteClusters":{"type":"array","default":[],"description":"Remote clusters to mesh with. Empty list disables ClusterMesh emission.","items":{"type":"object","required":["name","kubeconfigSecretName","podCIDRs","serviceCIDR","wireguardCIDR"],"properties":{"name":{"type":"string"},"kubeconfigSecretName":{"type":"string","description":"Name of the Secret in tenant-root holding the remote cluster admin kubeconfig (Kamaji default: <release>-admin-kubeconfig)."},"kubeconfigSecretKey":{"type":"string","default":"super-admin.conf"},"podCIDRs":{"type":"array","items":{"type":"string"}},"additionalCIDRs":{"type":"array","default":[],"description":"Extra CIDRs folded into the AllowedIPs of the remote cluster leader peer. Use this to cover pod subnets of non-leader nodes that lack a WireGuard IP (e.g. location-granularity clusters with multiple nodes).","items":{"type":"string"}},"serviceCIDR":{"type":"string"},"wireguardCIDR":{"type":"string"},"wireguardPort":{"type":"integer","default":51820},"persistentKeepalive":{"type":"integer","default":0,"description":"WireGuard PersistentKeepalive interval in seconds. Set to a non-zero value (e.g. 25) for clusters behind NAT to keep the stateful NAT mapping alive and enable bidirectional traffic.","minimum":0,"maximum":65535}}}}}}}}}},"network":{"type":"object","default":{},"description":"Cluster network CIDRs. Immutable after cluster creation.","properties":{"podCIDR":{"type":"string","default":"10.243.0.0/16","description":"CIDR allocated for pods. Maps onto CAPI Cluster.spec.clusterNetwork.pods.cidrBlocks."},"serviceCIDR":{"type":"string","default":"10.95.0.0/16","description":"CIDR allocated for Services. Maps onto CAPI Cluster.spec.clusterNetwork.services.cidrBlocks."},"dnsClusterIP":{"type":"string","default":"10.95.0.10","description":"ClusterIP for the CoreDNS Service. Must fall inside serviceCIDR."}}}}}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The localCluster field is used in the clustermesh.yaml template and documented in values.yaml, but it is completely missing from the OpenAPI schema in cozyrds.yaml. Defining it in the schema ensures proper validation, documentation, and auto-completion for users configuring the KubernetesSwitchcloud custom resource. Additionally, the new persistentKeepalive field should be documented under both remoteClusters and localCluster properties.

      {"title":"Chart Values","type":"object","properties":{"openstack":{"description":"OpenStack connection settings.","type":"object","default":{},"properties":{"authURL":{"description":"OpenStack identity endpoint.","type":"string","default":""},"regionName":{"description":"OpenStack region name.","type":"string","default":""},"applicationCredentialID":{"description":"Application credential ID.","type":"string","default":""},"applicationCredentialSecret":{"description":"Application credential secret.","type":"string","default":""},"cloudName":{"description":"Cloud name used as key in clouds.yaml.","type":"string","default":"openstack"},"existingSecret":{"description":"Name of existing Secret with clouds.yaml (overrides inline credentials).","type":"string","default":""},"floatingIPNetwork":{"description":"External network for Floating IPs (leave empty to disable FIPs).","type":"string","default":""},"loadBalancer":{"description":"Centralised LoadBalancer feature. When enabled, the management-cluster loadbalancer-controller provisions Octavia LBs for tenant Services of type LoadBalancer using the OpenStack credentials configured above; tenant cluster users never see those credentials directly.","type":"object","default":{},"properties":{"enabled":{"description":"Enable centralised LoadBalancer provisioning for this tenant. Off by default; turn on only after the controller has been deployed to the management cluster.","type":"boolean","default":false},"providerDriver":{"description":"Octavia provider driver. In Switch Cloud zhw only `ovn` is functional (the `amphora` driver has no flavors). Leave empty to use the project default.","type":"string","default":"ovn"},"vipNetworkID":{"description":"REQUIRED when enabled=true. Neutron network ID where the LB VIP port is allocated. Must be a tenant-owned network reachable from the worker nodes (typically the same network as spec.openstack.network.id). In Switch Cloud zhw direct-on-public VIP is not BGP-announced for IPv4, so VIP must live on a tenant network and external access is wired via floatingNetworkID below.","type":"string","default":""},"floatingNetworkID":{"description":"Optional external network ID. When set, the controller allocates a floating IP from this network after the LB becomes ACTIVE and binds it to the LB VIP port; the FIP address is reported back in Service.status.loadBalancer.ingress. Leave empty for internal-only LBs (no external reachability).","type":"string","default":""},"floatingSubnetID":{"description":"Optional Neutron subnet ID within floatingNetworkID. Pins FIP allocation to a specific subnet when the external network has multiple subnets (e.g. dual-stack). Leave empty to let Neutron pick.","type":"string","default":""},"workerSecurityGroupID":{"description":"Optional Neutron security group ID attached to the worker-node ports. When set, the controller manages per-Service ingress rules in this SG (one rule per Service port \u00d7 allowedCIDRs). Required for external traffic to reach the NodePort behind the LB: Octavia's pool members carry the worker port's SG, so without an explicit allow rule the default SG (which only permits intra-SG traffic) drops inbound packets coming from the LB VIP. Leave empty if you prefer to manage SG rules manually outside the controller; operator-managed rules are then your responsibility.","type":"string","default":""},"allowedCIDRs":{"description":"CIDR allowlist for inbound traffic to the LB NodePort. Each Service port gets ingress rules in workerSecurityGroupID for every CIDR listed here. Defaults to 0.0.0.0/0 (publicly reachable); narrow this when only specific networks should reach the Service. Ignored when workerSecurityGroupID is unset.","type":"array","default":["0.0.0.0/0"],"items":{"type":"string"}}}},"network":{"description":"OpenStack network settings. When `id` is set the chart operates in legacy mode and CAPO consumes that pre-existing network. When `id` is empty CAPO auto-manages a per-cluster network/subnet/router on apply and tears them down on cluster delete, giving full L2/L3 isolation between clusters in the same project.","type":"object","default":{},"properties":{"id":{"description":"Pre-existing OpenStack network ID for worker nodes. Legacy mode. Leave empty to enable auto-managed mode below. IMMUTABLE once set: clearing this field would switch a live cluster from legacy to auto-managed and trigger CAPO to provision a brand-new network, orphaning the old one and rerolling all workers (hard outage). Delete and recreate the cluster CR if you really need to switch modes.","type":"string","default":"","x-kubernetes-validations":[{"rule":"self == oldSelf || oldSelf == ''","message":"spec.openstack.network.id is immutable once set: clearing it would switch the cluster from legacy to auto-managed mode and trigger a network re-provision (data loss). Delete and recreate the cluster CR instead."}]},"subnetCIDR":{"description":"IPv4 CIDR for the auto-managed worker subnet. Used only when `id` is empty. Default `10.244.0.0/24` is chosen to NOT overlap the in-cluster default podCIDR `10.243.0.0/16`; pick a non-overlapping range if you mesh multiple clusters via Kilo.","type":"string","default":"10.244.0.0/24"},"externalNetworkID":{"description":"Neutron network ID used as the router's external gateway in auto-managed mode. Leave empty to let CAPO auto-discover the single project-visible external network (Switch Cloud zhw: the `public` network).","type":"string","default":""},"dnsNameservers":{"description":"DNS servers wired into the auto-managed subnet's DHCP. Used only when `id` is empty. Empty list = OpenStack default.","type":"array","default":[],"items":{"type":"string"}}}}}},"version":{"description":"Kubernetes version string (e.g. v1.32.6).","type":"string","default":"v1.32.6"},"talosVersion":{"description":"Talos OS version embedded in the node image.","type":"string","default":"v1.10.0"},"controlPlane":{"description":"Kamaji control-plane configuration.","type":"object","default":{},

@IvanHunters IvanHunters marked this pull request as ready for review June 19, 2026 09:49
@kvaps

kvaps commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Superseded by #154. The persistentKeepalive template support landed in #152 (clustermesh.yaml migrated to flat allowedNetworks); the CR schema (cozyrds.yaml flat allowedNetworks + persistentKeepalive) and operator 0.6.1 (which stops forcing InternalIP endpoints) are in #154.

@kvaps kvaps closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants