Add replace-label safe-outputs type#40423
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…call signatures Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot update specification with w3c spec writer. then apply specification to new code. |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
… handling Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done. Here's what was produced and applied: Spec (
Spec applied to code:
|
|
✅ Test Quality Sentinel completed test quality analysis. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🏗️ Design Decision Gate — ADR RequiredThis PR makes significant changes to core business logic (241 new lines across business-logic directories) but does not have a linked Architecture Decision Record (ADR). 📄 Draft ADR committed:
📋 What to do next
Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision. ❓ Why ADRs Matter
ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you. 📋 Michael Nygard ADR Format ReferenceAn ADR must contain these four sections to be considered complete:
All ADRs are stored in 🔒 This PR cannot merge until an ADR is linked in the PR body.
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd and /grill-with-docs — leaving as COMMENT; no blocking bugs found but three test gaps are worth closing before this ships.
📋 Key Themes & Highlights
Key Themes
- Dead code:
RESOLVE_LABEL_QUERY— defined, exported, and documented but never called. The implementation uses per-label REST calls instead of this bulk GraphQL fetch. Either remove it or use it. - Test coverage gaps: The spec defines 44 normative test IDs (T-RL-001–T-RL-072); the test file has 13 test cases. Three specific gaps stand out: (1) RL-046 partial mutation failure, (2)
createLabelfailure path, and (3) no test exercises the GraphQL mutation path at all (sinceRESOLVE_LABEL_QUERYis unused). - Fragile 404 detection:
resolveOrCreateLabelchecksmsg.includes("404")rather thanerr.status === 404— the latter is more reliable. - Hash entropy:
deterministicLabelColorconsumes only 18 of 32 hash bits, increasing collision probability among label names. Minor visual issue.
Positive Highlights
- ✅ Excellent wiring consistency — all 12 Go infrastructure files follow the exact pattern of
add-labels/remove-labels, making the Go side very low-risk. - ✅ The single-request GraphQL mutation correctly achieves the atomicity goal stated in the PR description.
- ✅ Graceful handling of missing
label_to_remove(proceeds with add-only) is correctly implemented and tested. - ✅ Staged-mode preview is implemented and tested.
- ✅ Per-repo label node-ID cache (
repoCaches) correctly avoids redundant REST calls across multiple messages in one run. - ✅ The W3C-style spec is thorough and provides a solid compliance baseline for future testing.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 99.8 AIC · ⌖ 8.99 AIC · ⊞ 6.9K
| * | ||
| * @type {string} | ||
| */ | ||
| const RESOLVE_LABEL_QUERY = /* GraphQL */ ` |
There was a problem hiding this comment.
[/grill-with-docs] RESOLVE_LABEL_QUERY is defined, exported, and documented — but never called. The implementation resolves labels via rest.issues.getLabel (one REST call per label) instead of this bulk GraphQL query.
💡 Options
Either:
- Remove the constant and its export — it is dead code that misleads readers into thinking GraphQL label resolution is the implementation path.
- Use it — replace the per-label REST call in
resolveOrCreateLabelwith a singleRESOLVE_LABEL_QUERYcall fetching all repo labels up-front, then look up by name (more efficient for repos adding many labels in one run).
The current export in module.exports implies a public API contract, but no test exercises the GraphQL label-resolution path at all.
There was a problem hiding this comment.
Fixed: RESOLVE_LABEL_QUERY constant and its module.exports entry have been removed. The module now exports only { main, REPLACE_LABEL_MUTATION }.
| return label.node_id; | ||
| } catch (err) { | ||
| const msg = getErrorMessage(err); | ||
| if (!msg.includes("404") && !msg.toLowerCase().includes("not found")) { |
There was a problem hiding this comment.
[/tdd] 404 detection relies on matching error message strings rather than the HTTP status code, making it fragile if the underlying client changes its error format.
💡 Suggested fix
Instead of (or in addition to) string-matching, check err.status or err.status === 404 which Octokit/REST sets reliably:
if (err.status !== 404) {
throw err;
}This is more robust than msg.includes("404") which could false-positive on a label named "404" or a repo description containing that string.
There was a problem hiding this comment.
Fixed: string-matching replaced with err?.status !== 404 for reliable structured status detection.
| // RL-046: detect partial mutation success — remove succeeded but add failed | ||
| const errAsAny = /** @type {any} */ err; | ||
| const partialData = errAsAny?.data; | ||
| if (partialData?.removeLabels && !partialData?.addLabels) { |
There was a problem hiding this comment.
[/tdd] The RL-046 partial mutation failure branch (remove succeeded, add failed) is explicitly called out in the spec and has dedicated error messaging — but it has no corresponding test case. This is the most important missing test given the spec's normative requirement for it.
💡 Suggested test
it("should log partial failure when remove succeeds but add fails (RL-046)", async () => {
const gqlError = new Error("add mutation failed");
gqlError.data = {
removeLabels: { clientMutationId: null }, // remove succeeded
// addLabels is absent — simulates partial failure
};
mockGithub.graphql = async () => { throw gqlError; };
const handler = await main({});
const result = await handler({ label_to_remove: "in-progress", label_to_add: "done" }, {});
expect(result.success).toBe(false);
expect(mockCore.errors.some(e => e.includes("in-progress") && e.includes("done"))).toBe(true);
});Without this test, a future refactor of the error-handling path could silently break the spec compliance obligation.
There was a problem hiding this comment.
Added the RL-046 partial-failure test ("should log partial failure and return error when remove succeeds but add fails (RL-046)"). Also added a comment explaining that withRetry's enhanceError wraps the original error, so the detection checks both err.data (direct graphql error) and err.originalError.data (wrapped).
| expect(result.labelAdded).toBe("done"); | ||
| }); | ||
|
|
||
| it("should create label_to_add if it does not exist in the repo", async () => { |
There was a problem hiding this comment.
[/tdd] The label auto-create test covers the happy path (404 → createLabel succeeds) but not the failure path (404 → createLabel also throws). Missing this leaves the error propagation from resolveOrCreateLabel untested.
💡 Suggested test
it("should return error when label_to_add cannot be created in the repo", async () => {
mockGithub.rest.issues.getLabel = async () => {
const err = new Error("Not Found");
err.status = 404;
throw err;
};
mockGithub.rest.issues.createLabel = async () => {
throw new Error("Resource not accessible by integration");
};
const handler = await main({});
const result = await handler({ label_to_remove: "in-progress", label_to_add: "needs-review" }, {});
expect(result.success).toBe(false);
expect(result.error).toContain("Failed to resolve/create label");
});There was a problem hiding this comment.
Auto-creation has been removed entirely: the handler now fails with a hard error when label_to_add is not found (404), consistent with add_labels. The test was updated to "should return error when label_to_add does not exist in the repo", and a separate "should return error when getLabel fails with a non-404 error" test was also added.
| } | ||
| const r = 128 + (hash & 0x3f); | ||
| const g = 128 + ((hash >> 6) & 0x3f); | ||
| const b = 128 + ((hash >> 12) & 0x3f); |
There was a problem hiding this comment.
[/grill-with-docs] The hash function uses only 18 bits of the 32-bit integer (bits 0–5 for R, 6–11 for G, 12–17 for B), meaning any two label names whose hashes are identical in the low 18 bits will produce the same color regardless of how different they are in the high bits.
💡 Better entropy distribution
A simple improvement is to mix the high bits back in via XOR before extracting channels:
function deterministicLabelColor(name) {
let hash = 0;
for (let i = 0; i < name.length; i++) {
hash = (hash * 31 + name.charCodeAt(i)) >>> 0;
}
// fold high bits into low bits for better distribution
hash ^= (hash >>> 18);
const r = 128 + (hash & 0x3f);
const g = 128 + ((hash >> 6) & 0x3f);
const b = 128 + ((hash >> 12) & 0x3f);
return ((r << 16) | (g << 8) | b).toString(16).padStart(6, "0");
}This is a minor visual quality issue, not a correctness bug — but worth noting given the spec dedicates a subsection to label color determinism.
There was a problem hiding this comment.
deterministicLabelColor has been removed entirely — label auto-creation was removed from the handler, so there is no longer a need for a color algorithm. The function and its test suite are gone.
There was a problem hiding this comment.
Pull request overview
This PR introduces a new replace-label safe-output type that lets agents transition label state by removing one label and adding another via a single GraphQL request, aligning label-based state machines with safer (reduced-race) semantics.
Changes:
- Adds
replace-labelconfiguration/parsing and wires it into safe-outputs enablement, validation, handler registry, tool computation, and prompt tooling. - Implements the JavaScript handler (
replace_label.cjs) plus Vitest coverage for core behaviors (add-only when source label missing, auto-create target label, staged mode). - Extends schema and docs, including a detailed spec document and references in safe-outputs documentation.
Show a summary per file
| File | Description |
|---|---|
| specs/replace-label-spec.md | Adds a formal spec for replace-label, including processing stages and conformance requirements. |
| pkg/workflow/unified_prompt_step.go | Exposes the replace_label tool in the unified prompt tool list with max-budget integration. |
| pkg/workflow/tool_description_enhancer.go | Adds constraint-enhancement text for replace_label tool descriptions. |
| pkg/workflow/safe_outputs_validation.go | Includes replace-label in target validation wiring. |
| pkg/workflow/safe_outputs_validation_config.go | Adds message schema validation config for replace_label. |
| pkg/workflow/safe_outputs_tools_repo_params.go | Adds repo-parameter inference support for replace_label. |
| pkg/workflow/safe_outputs_tools_computation.go | Enables replace_label in the computed tool set when configured. |
| pkg/workflow/safe_outputs_state.go | Includes ReplaceLabel in safe-outputs enabled checks. |
| pkg/workflow/safe_outputs_max_validation.go | Validates replace_label max field formatting/constraints. |
| pkg/workflow/safe_outputs_handler_registry.go | Emits handler config for replace_label and ensures null/empty config enables the handler. |
| pkg/workflow/safe_outputs_config.go | Parses replace-label from workflow frontmatter into SafeOutputsConfig. |
| pkg/workflow/safe_output_validation_config_test.go | Updates validation-config JSON test expectations to include replace_label. |
| pkg/workflow/safe_output_handlers.go | Registers replace-label as a safe-output handler descriptor with permissions gating. |
| pkg/workflow/replace_label.go | Adds ReplaceLabelConfig and frontmatter parsing helper. |
| pkg/workflow/compiler_types.go | Extends SafeOutputsConfig with ReplaceLabel. |
| pkg/parser/schemas/main_workflow_schema.json | Adds the replace-label safe-outputs schema entry and related schema text updates. |
| actions/setup/js/replace_label.cjs | Implements the replace_label handler (REST label lookup/creation + combined GraphQL mutation). |
| actions/setup/js/replace_label.test.cjs | Adds unit tests for replace_label handler behavior. |
| .github/aw/syntax-agentic.md | Adds replace-label to the safe-outputs type reference list. |
| .github/aw/safe-outputs-management.md | Documents replace-label alongside existing label safe-outputs. |
| .changeset/replace-label.md | Adds a minor-version changeset describing the new safe-output type. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 22/22 changed files
- Comments generated: 5
| githubClient.graphql(REPLACE_LABEL_MUTATION, { | ||
| labelableId, | ||
| addLabelIds: [addLabelNodeId], | ||
| removeLabelIds: removeLabelNodeId ? [removeLabelNodeId] : [], | ||
| }), | ||
| RATE_LIMIT_RETRY_CONFIG, |
There was a problem hiding this comment.
Fixed with @include(if: $doRemove). The mutation now includes a $doRemove: Boolean! variable; when label_to_remove is absent from the item, $doRemove is false and removeLabelsFromLabelable is omitted from the request entirely, avoiding any empty-labelIds issue.
| } catch (err) { | ||
| const msg = getErrorMessage(err); | ||
| if (!msg.includes("404") && !msg.toLowerCase().includes("not found")) { | ||
| throw err; | ||
| } | ||
| } |
There was a problem hiding this comment.
Fixed: 404 detection now uses err?.status !== 404 (structured status check) instead of string-matching.
| // Validate label_to_remove against allowed-remove and blocked patterns | ||
| const removeValidation = validateLabels([labelToRemove], configAllowedRemove, 1, blockedPatterns); | ||
| if (!removeValidation.valid) { | ||
| core.warning(`label_to_remove validation failed: ${removeValidation.error}`); | ||
| return { success: false, error: removeValidation.error ?? "Invalid label_to_remove" }; | ||
| } | ||
|
|
||
| // Validate label_to_add against allowed-add and blocked patterns | ||
| const addValidation = validateLabels([labelToAdd], configAllowedAdd, 1, blockedPatterns); | ||
| if (!addValidation.valid) { | ||
| core.warning(`label_to_add validation failed: ${addValidation.error}`); | ||
| return { success: false, error: addValidation.error ?? "Invalid label_to_add" }; | ||
| } |
There was a problem hiding this comment.
Fixed: replaced both validateLabels() calls with a new validateSingleLabel() function that (a) checks blocked patterns first, then the allowlist, (b) rejects with a specific named error rather than silently filtering, and (c) never truncates the label name.
| **RL-001**: Glob pattern matching for `allowed-add`, `allowed-remove`, and `blocked` MUST follow the semantics of the `gobwas/glob` library (as used elsewhere in gh-aw), where `*` matches any sequence of characters within a label name and `[...]` denotes a character class. | ||
|
|
||
| **RL-002**: A label name MUST match at least one pattern in an allowlist (`allowed-add` or `allowed-remove`) when the list is non-empty. An empty or absent allowlist permits any label name. | ||
|
|
||
| **RL-003**: A label name MUST NOT match any pattern in the `blocked` list. Blocklist evaluation MUST occur after allowlist evaluation. A label that passes the allowlist but matches a blocked pattern MUST be rejected. | ||
|
|
There was a problem hiding this comment.
Fixed in spec: RL-003 now reads "Blocklist evaluation MUST occur before allowlist evaluation (it is a security boundary). A label matching a blocked pattern MUST be rejected immediately, regardless of any allowlist entry." — matching the actual order in safe_output_validator.cjs and in the new validateSingleLabel().
| **RL-028**: Staged mode execution MUST return `{ success: true, staged: true }` from the handler, and MUST NOT decrement any rate-limit budget or increment the operation count. | ||
|
|
There was a problem hiding this comment.
Fixed in spec: RL-028 now documents the actual behavior — staged messages DO count toward max because createCountGatedHandler increments processedCount before delegating to the handler.
|
@copilot revert changes to smoke agentic workflows |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Reverted in the latest commit — all five smoke workflows are back to using |
|
@copilot review all comments and address unresolved review feedback.
|
|
@copilot please summarize any remaining blockers and the next step to get this merged.
|
|
@copilot merge main and recompile |
…lace-label-safe-outputs Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
``
|
…safe-outputs' into copilot/implement-replace-label-safe-outputs Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done. Merged main and recompiled — all 249 workflows compiled successfully (0 errors). |
|
@copilot please refresh this branch and summarize the remaining blockers after the recompile.
|
|
@copilot merge main, recompile, fix https://github.com/github/gh-aw/actions/runs/27922860220/job/82619565998?pr=40423 |
…TypeScript errors Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done. Merged main (already up to date) and fixed the two TypeScript errors in
Typecheck now passes with 0 errors. |
|
@copilot review all comments and address unresolved review feedback. Please address the outstanding review feedback (dead
|
|
@copilot review all comments and address unresolved review feedback.
|
|
@copilot please refresh the branch state and share a short blocker summary once review feedback is handled.
|
|
@copilot review all comments and resolve the open review threads before refreshing the branch.
|
|
``
|
Adds a new
replace-labelsafe-outputs type that transitions label state in a single GraphQL request — removing one label and adding another, reducing the race window present when usingremove-labels+add-labelsin sequence.New type:
replace-labelThe agent calls
replace_label(label_to_remove, label_to_add). Theallowed-add/allowed-remove/blocked/required-labels/required-title-prefixguards work identically toadd-labelsandremove-labels. The optionalallowed-transitionslist further restricts mutations to an explicit state machine — only the listed(from, to)pairs are permitted when it is configured.Implementation
pkg/workflow/replace_label.go—ReplaceLabelConfigstruct (BaseSafeOutputConfig+SafeOutputTargetConfig+SafeOutputFilterConfig+AllowedAdd/AllowedRemove/Blocked+AllowedTransitions);LabelTransitionstruct (From/To)replace_label.cjs) — resolveslabel_to_addnode ID via REST (fails with a hard error if the label does not exist in the repository, consistent withadd-labels); uses@include(if: $doRemove)to conditionally omit the remove field whenlabel_to_removeis not on the item; validates each label with explicit allowlist/blocklist checks (no silent filtering or name truncation); validates(label_to_remove, label_to_add)pair againstallowed-transitionsafter blocked/allowlist guards; detects partial mutation failure (remove succeeded, add failed) and logs a specific error per RL-046replace-labelentry inmain_workflow_schema.jsonincluding the newallowed-transitionsarray property withfrom/torequired fields[]map[string]string), validation config, tools computation, repo params, max validation, prompt step budget, tool description enhancer (13 files, following established pattern)Specification
specs/replace-label-spec.md— W3C-style Candidate Recommendation specification (v1.0.0) covering the 8-stage processing pipeline, config/message schemas, GraphQL mutation interface and sequential-execution semantics, error-handling categories (soft-skip vs hard-error), security considerations, and compliance test IDs mapped to normative requirement codes.github/aw/safe-outputs-management.md— addedreplace-labeldocumentation entry alongsideadd-labels/remove-labels.github/aw/syntax-agentic.md— addedreplace-labelto the safe-outputs type reference listpr-sous-chef: requested branch update from run 27914085128.
pr-sous-chef run https://github.com/github/gh-aw/actions/runs/27919051457