Skip to content

USHIFT-6800: Add c2cc reboot tests#6943

Open
vimauro wants to merge 3 commits into
openshift:mainfrom
vimauro:reboot-tests
Open

USHIFT-6800: Add c2cc reboot tests#6943
vimauro wants to merge 3 commits into
openshift:mainfrom
vimauro:reboot-tests

Conversation

@vimauro

@vimauro vimauro commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • Tests
    • Added reboot scenario coverage for clusters, including single and simultaneous reboot flows.
    • Expanded validation to confirm connectivity, workload stability, health checks, and DNS behavior after reboot.
    • Improved reboot handling steps so clusters can be reconnected and verified consistently after downtime.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 25, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 25, 2026

Copy link
Copy Markdown

@vimauro: This pull request references USHIFT-6800 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@vimauro

vimauro commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

/label tide/merge-method-squash

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

Adds Robot Framework coverage for C2CC reboot scenarios, including helper keywords to reconnect clusters and verify reboot completion, plus a suite that runs single and simultaneous reboot cases with connectivity, infrastructure, health, and DNS checks.

Changes

C2CC reboot scenarios

Layer / File(s) Summary
Reboot helper keywords
test/resources/c2cc.resource
Defines cluster reconnect, simultaneous reboot, and reboot-completion checks using SSH, boot IDs, and greenboot status.
Suite setup and reboot cases
test/suites/c2cc/reboot.robot
Declares the reboot suite metadata, setup and teardown flow, readiness gate, healthy-cluster precondition, and three reboot test cases.
Connectivity and infrastructure verification
test/suites/c2cc/reboot.robot
Implements the retrying full-stack verification flow, ordered connectivity checks, infrastructure checks, and per-peer route and rule validation.
Health and DNS checks
test/suites/c2cc/reboot.robot
Adds remote cluster health probe assertions and CoreDNS/DNS verification for cross-cluster service access.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • eslutsky
🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: adding c2cc reboot tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Touched files add only static Robot test case names; no Ginkgo It/Describe/Context/When titles or dynamic interpolation appear.
Test Structure And Quality ✅ Passed PASS — This PR adds Robot Framework suites/keywords, not Ginkgo; the reboot suite uses Suite Setup/Teardown and explicit 10m/300s timeouts, matching repo patterns.
Microshift Test Compatibility ✅ Passed Robot-only c2cc additions; no new Go/Ginkgo tests or Ginkgo constructs were added in the changed area.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the PR adds Robot Framework c2cc reboot suites, so the SNO-specific check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed Only Robot test/resource files changed; no deployment, controller, or manifest scheduling logic was added.
Ote Binary Stdout Contract ✅ Passed PR only adds Robot Framework resource/suite keywords and tests; no main/TestMain/RunSpecs setup or stdout-writing binary code was changed.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No public internet use or IPv4-only logic found; the new helper brackets IPv6 URLs and the suite uses cluster-internal DNS and CIDRs.
No-Weak-Crypto ✅ Passed The added Robot keywords and reboot suite contain no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, no custom crypto, and no secret/token comparisons.
Container-Privileges ✅ Passed Only Robot/resource files changed; no container/K8s manifests or privileged settings were added.
No-Sensitive-Data-In-Logs ✅ Passed PASS: The new reboot keywords and suite add no secret-bearing logs; touched code only reuses generic command-output logging.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 25, 2026
@openshift-ci openshift-ci Bot requested review from jerpeter1 and pmtk June 25, 2026 13:56
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vimauro
Once this PR has been reviewed and has the lgtm label, please assign kasturinarra for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vimauro

vimauro commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot

openshift-ci-robot commented Jun 25, 2026

Copy link
Copy Markdown

@vimauro: This pull request references USHIFT-6800 which is a valid jira issue.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/resources/c2cc.resource`:
- Around line 456-458: The re-registration flow in the remote cluster setup is
mutating `${C2CC_REMOTE_ALIASES}` before `Register Remote Cluster` succeeds,
which can leave teardown state inconsistent if that keyword fails. Update the
logic around `Remove Values From List` and `Register Remote Cluster` so the
alias is only removed after a successful re-registration, or use `TRY/FINALLY`
to restore/reconcile `${C2CC_REMOTE_ALIASES}` on failure. Keep the
teardown-tracked alias list in sync in the re-registration path used by `Wait
Until Keyword Succeeds`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 52322569-b28b-466c-bc7a-ba41b45857c4

📥 Commits

Reviewing files that changed from the base of the PR and between 8d0593e and 5296d02.

📒 Files selected for processing (2)
  • test/resources/c2cc.resource
  • test/suites/c2cc/reboot.robot

Comment on lines +456 to +458
${kubeconfig}= Get From Dictionary ${C2CC_KUBECONFIGS} ${alias}
Remove Values From List ${C2CC_REMOTE_ALIASES} ${alias}
Register Remote Cluster ${alias} ${host} ${port} ${kubeconfig}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Re-registration is not failure-safe for teardown.

Remove Values From List drops the alias from ${C2CC_REMOTE_ALIASES} before Register Remote Cluster re-adds it. If Register Remote Cluster errors (host still down), the alias is gone from the tracked list. Within Wait Until Keyword Succeeds retries this self-heals, but if all retries exhaust, Teardown All Remote Clusters will never switch to / close that connection, leaking it and leaving teardown state inconsistent.

Consider only mutating the tracking list after a successful re-registration, or guarding with TRY/FINALLY so the list is reconciled even on the failure path.

Based on learnings: teardown state (the alias/interface list consumed by teardown keywords) must be populated reliably even when the mutating keyword errors before completing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/resources/c2cc.resource` around lines 456 - 458, The re-registration
flow in the remote cluster setup is mutating `${C2CC_REMOTE_ALIASES}` before
`Register Remote Cluster` succeeds, which can leave teardown state inconsistent
if that keyword fails. Update the logic around `Remove Values From List` and
`Register Remote Cluster` so the alias is only removed after a successful
re-registration, or use `TRY/FINALLY` to restore/reconcile
`${C2CC_REMOTE_ALIASES}` on failure. Keep the teardown-tracked alias list in
sync in the re-registration path used by `Wait Until Keyword Succeeds`.

Source: Learnings

Suite Setup Setup
Suite Teardown Teardown

Test Tags c2cc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add here disruptive tag to add these suite to Disruptive runs, same as the ones Patryk created on the other PR, example

@pmtk wdyt?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they can fit into single scenario, then definitely. If not, it should be separate one.

Except I'd replace c2cc with disruptive. If c2cc is present, it would run within regular scenarios.

Wait For Test Pods
Wait For Service Endpoints

Verify Full C2CC Stack

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vimauro @pmtk I feel we are rewriting (almost) the sameVerify keywords to check the cluster are ok on every PR, don't you?

This PR is ok as it is. I prefer to merge it now and later, on a follow up PR, do a refactor to group better these Verify checks. What do you think?

@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@vimauro: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify 5296d02 link true /test verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants