fix(running-in-ci): trim CI-poll loops to fit Bash tool's 10-min cap#695
Conversation
The foreground CI-poll recipe and the rerun-jobs poll both ran `for i in $(seq 1 15); do sleep 60; ...; done` — a minimum of 15 minutes if CI didn't finish early. The Claude Code Bash tool's max configurable timeout is 600000 ms (10 min), past which the harness auto-backgrounds the call and blocks foreground `sleep` waits. Once that happens the gated follow-up (dismiss approval on failure, post failure analysis) can't fire in-session — background-completion notifications are not reliably delivered to a CI session, per the same skill's own policy. Trim both loops to `seq 1 9` (≥9 min minimum) so the recipe fits inside the harness cap, and add a note that callers must invoke Bash with `timeout: 600000` (the default 2-min timeout would kill the loop early). Closes #694 Co-Authored-By: Claude <noreply@anthropic.com>
tend-agent
left a comment
There was a problem hiding this comment.
Self-review of the trim. Math checks out (9 × 60s + at most one 30s grace ≈ 570s, comfortably under the 600s Bash cap) and the inline 15-min/15 minutes references are all updated. One gap: the new timeout: 600000 caller note was only added next to the CI Monitoring loop, not the structurally-identical gh run rerun --failed rollup loop below — see inline. A reader who jumps straight to the rerun recipe still hits the 2-min default Bash timeout.
Self-review caught that the timeout: 600000 caller note was only added next to the CI Monitoring loop, not the structurally-identical gh run rerun --failed rollup loop below. A reader following just the rerun recipe would still hit the default 2-min Bash timeout. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Two more occurrences from a different repo, tonight on
A third PR in the same Dependabot batch, PRQL/prql#6011 (insta-cmd 0.6.0 → 0.7.0), got a clean empty-body APPROVED review from the same session shape — its CI finished in ~6 min, well under the 10-min cap, so the loop returned and the verdict shipped. Two notes on prioritization:
The Recording these two as additional structural-confirmation evidence in the |
#721 (base workflow-regen worktree on an open PR, not branch-ref existence) merged to main after this release branch was cut; it's a Fixed-scope change to the bundled nightly skill that adopters run, so it belongs in the 0.1.7 notes alongside #695. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bumps generator version to 0.1.7, syncs the lockfile, and adds the `## 0.1.7` CHANGELOG section (published verbatim as the GitHub Release notes on tag push). Changes since 0.1.6: pin actions/checkout to v7 with review opting into the fork-PR checkout guard (#725); bump claude-code to 2.1.185 (#719); running-in-ci surfaces blocking scope rules instead of routing around them (#717) and caps CI-poll loops to the Bash 10-min limit (#695); nightly workflow-regen bases its worktree on an open PR rather than branch-ref existence (#721); de-duplicate composite-action step bodies under shared/steps/ with harness-named action paths (#712); correct the codex effort list (#710); review-reviewers and worker-deploy doc fixes (#707, #711).
Problem
The bundled
running-in-ciskill's foreground CI-poll recipe (and the structurally-identicalgh run rerun --failedrollup loop) iteratedfor i in $(seq 1 15); do sleep 60; ...; done— a ≥15-minute wall clock if CI didn't finish early. The Claude Code Bash tool's max configurable timeout is 600000 ms (10 min); past that the harness auto-backgrounds the call. The skill's own policy says background-completion notifications are not reliably delivered to a CI session, so the gated follow-up (dismiss approval on failure, post failure analysis) can't fire — and the bot'ssleepworkarounds get blocked by the harness's anti-polling guards, ending the session before the dismissal runs.#694 documents six occurrences in a single 24h window on
max-sixty/worktrunk(six of ~20 token-bearingtend-reviewruns — ~30%). All six PRs merged cleanly so the loss-of-dismissal path was not exercised, but the failure mode is structural and would fail open on a future red-CI run.This is the same root-cause class as #674 (triage's full-suite gate exceeding the 10-min cap) but in a different code path — the gated CI dismissal path on review.
Solution
Trim both loops from
seq 1 15toseq 1 9. With 60s sleeps per iteration plus the at-most-once 30s grace re-check, the worst case sits comfortably inside the 600000 ms Bash cap so the loop runs to completion and the gated follow-up fires.Also added a one-line note that callers must invoke Bash with
timeout: 600000— the default 2-min Bash timeout would kill even the trimmed loop early.Updated the inline references that cited the old 15-minute figure (
"Polling for it deadlocks until the 15-min cap breaks it"→"loop cap","CI still running after 15 minutes"→"9 minutes","up to ~15 minutes"→"up to ~9 minutes").Testing
Skill text fix — no automated test. Verified by reading the recipe end-to-end: 9 iterations × 60s + one 30s grace + small
gh/jqoverhead per iteration stays under 600s. The reporter's evidence (six session log traces showing the auto-background sequence) is the reproduction.Closes #694 — automated triage