Skip to content

USHIFT-6800: Add c2cc reboot tests#6943

Open
vimauro wants to merge 6 commits into
openshift:mainfrom
vimauro:reboot-tests
Open

USHIFT-6800: Add c2cc reboot tests#6943
vimauro wants to merge 6 commits into
openshift:mainfrom
vimauro:reboot-tests

Conversation

@vimauro

@vimauro vimauro commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • Tests
    • Added a C2CC reboot test suite covering single-cluster, dual, and three-cluster simultaneous reboot scenarios.
    • Improved reboot recovery by snapshotting per-cluster boot IDs, rebooting all targets simultaneously, and reconnecting each cluster to confirm boot ID changes plus Greenboot healthcheck completion.
    • Strengthened post-reboot verification across workloads, C2CC connectivity (including source IP preservation), networking/routing and nftables/OVN rules, RemoteCluster health and probe timestamps, and cross-cluster DNS validation.
    • Generalized health waiting to cover all configured remote clusters (not just fixed examples).

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 25, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 25, 2026

Copy link
Copy Markdown

@vimauro: This pull request references USHIFT-6800 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@vimauro

vimauro commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

/label tide/merge-method-squash

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 9b23ec7e-0f66-49a8-aaee-727780f7352b

📥 Commits

Reviewing files that changed from the base of the PR and between cfece6a and 15cc38a.

📒 Files selected for processing (1)
  • test/resources/c2cc.resource
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/resources/c2cc.resource

Walkthrough

Adds Robot Framework coverage for C2CC reboot scenarios, including helper keywords for cluster reconnect and reboot validation, plus a suite that runs single and simultaneous reboot cases with connectivity, infrastructure, health, and DNS checks.

Changes

C2CC reboot scenarios

Layer / File(s) Summary
Reboot helper keywords
test/resources/c2cc.resource
Updates healthy-cluster iteration and adds reboot, reconnect, and reboot-state validation keywords for cluster aliases.
Suite setup and reboot cases
test/suites/c2cc/reboot.robot
Adds suite metadata, setup and teardown flow, readiness gating, healthy-cluster preconditions, and three reboot test cases.
Post-reboot verification checks
test/suites/c2cc/reboot.robot
Adds retrying verification for connectivity, infrastructure state, health probes, and DNS-based service access across clusters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • jogeo
  • vanhalenar

Possibly related PRs

🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding C2CC reboot tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The new suite’s test case titles are static strings, and no It/Describe/Context/When titles were added in the changed files.
Test Structure And Quality ✅ Passed No Ginkgo-style quality issues found; the added Robot and unit tests use suite setup/teardown, bounded waits, and clean test helpers consistently.
Microshift Test Compatibility ✅ Passed The PR adds Robot Framework suite/resource keywords, not Ginkgo e2e tests, so this MicroShift Ginkgo compatibility check doesn’t apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PASS: The new reboot tests are Robot suites and only target three separate clusters; they don't assert multi-node/HA behavior inside a cluster or use SNO-unsafe topology assumptions.
Topology-Aware Scheduling Compatibility ✅ Passed Only Robot test/resource files changed; no deployment manifests, controllers, or scheduling fields were added.
Ote Binary Stdout Contract ✅ Passed PASS — only Robot Framework test/resource files changed; no main/init/TestMain/suite-setup code or stdout writes were added.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No runtime IPv4-only assumptions or public-internet dependencies found; the reboot suite reuses cluster-internal DNS and IPv6-aware helpers.
No-Weak-Crypto ✅ Passed The diff only adds reboot/test orchestration; no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret/token comparisons appear.
Container-Privileges ✅ Passed Only Robot Framework files changed; inspected c2cc.resource and reboot.robot and found no container/K8s privilege settings or manifests.
No-Sensitive-Data-In-Logs ✅ Passed No new logs expose secrets/PII/internal hosts; reboot helpers use aliases/boot IDs only and add no sensitive logging.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 25, 2026
@openshift-ci openshift-ci Bot requested review from jerpeter1 and pmtk June 25, 2026 13:56
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vimauro
Once this PR has been reviewed and has the lgtm label, please assign kasturinarra for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vimauro

vimauro commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot

openshift-ci-robot commented Jun 25, 2026

Copy link
Copy Markdown

@vimauro: This pull request references USHIFT-6800 which is a valid jira issue.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/resources/c2cc.resource`:
- Around line 456-458: The re-registration flow in the remote cluster setup is
mutating `${C2CC_REMOTE_ALIASES}` before `Register Remote Cluster` succeeds,
which can leave teardown state inconsistent if that keyword fails. Update the
logic around `Remove Values From List` and `Register Remote Cluster` so the
alias is only removed after a successful re-registration, or use `TRY/FINALLY`
to restore/reconcile `${C2CC_REMOTE_ALIASES}` on failure. Keep the
teardown-tracked alias list in sync in the re-registration path used by `Wait
Until Keyword Succeeds`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 52322569-b28b-466c-bc7a-ba41b45857c4

📥 Commits

Reviewing files that changed from the base of the PR and between 8d0593e and 5296d02.

📒 Files selected for processing (2)
  • test/resources/c2cc.resource
  • test/suites/c2cc/reboot.robot

Comment thread test/resources/c2cc.resource Outdated
Comment on lines +456 to +458
${kubeconfig}= Get From Dictionary ${C2CC_KUBECONFIGS} ${alias}
Remove Values From List ${C2CC_REMOTE_ALIASES} ${alias}
Register Remote Cluster ${alias} ${host} ${port} ${kubeconfig}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Re-registration is not failure-safe for teardown.

Remove Values From List drops the alias from ${C2CC_REMOTE_ALIASES} before Register Remote Cluster re-adds it. If Register Remote Cluster errors (host still down), the alias is gone from the tracked list. Within Wait Until Keyword Succeeds retries this self-heals, but if all retries exhaust, Teardown All Remote Clusters will never switch to / close that connection, leaking it and leaving teardown state inconsistent.

Consider only mutating the tracking list after a successful re-registration, or guarding with TRY/FINALLY so the list is reconciled even on the failure path.

Based on learnings: teardown state (the alias/interface list consumed by teardown keywords) must be populated reliably even when the mutating keyword errors before completing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/resources/c2cc.resource` around lines 456 - 458, The re-registration
flow in the remote cluster setup is mutating `${C2CC_REMOTE_ALIASES}` before
`Register Remote Cluster` succeeds, which can leave teardown state inconsistent
if that keyword fails. Update the logic around `Remove Values From List` and
`Register Remote Cluster` so the alias is only removed after a successful
re-registration, or use `TRY/FINALLY` to restore/reconcile
`${C2CC_REMOTE_ALIASES}` on failure. Keep the teardown-tracked alias list in
sync in the re-registration path used by `Wait Until Keyword Succeeds`.

Source: Learnings

Comment thread test/suites/c2cc/reboot.robot Outdated
Comment thread test/suites/c2cc/reboot.robot
Comment thread test/resources/c2cc.resource
Comment thread test/suites/c2cc/reboot.robot Outdated
Comment thread test/suites/c2cc/reboot.robot Outdated
Comment thread test/suites/c2cc/reboot.robot Outdated
Comment thread test/suites/c2cc/reboot.robot Outdated
@agullon

agullon commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

/retest

1 similar comment
@vimauro

vimauro commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@agullon

agullon commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@vimauro unfortunatelly the tests you added were not exected.
The reason is we are only triggering disruptive.robot in https://github.com/openshift/microshift/blob/main/test/scenarios-bootc/c2cc/el102-src%40c2cc-disruptive.sh#L23 and https://github.com/openshift/microshift/blob/main/test/scenarios-bootc/c2cc/el98-src%40c2cc-disruptive.sh#L23

The fix you can add is to remove the disruptive.robot from both lines I shared with you.

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@vimauro: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants