feat(eval): added isolated/apply/merge options to agent() helper#3205
feat(eval): added isolated/apply/merge options to agent() helper#3205roboomp wants to merge 18 commits into
Conversation
The workflowz eval path bypasses the task tool's isolation wrapper and calls runSubprocess() directly, so parallel agent() fan-outs that edit overlapping files all land in the parent worktree. Extends the eval agent bridge schema with isolated/apply/merge, forwards them through the Python and JS preludes, and adds a shared task/isolation-runner.ts so the lifecycle (prepare context → run in worktree → capture patch/branch → merge → cleanup) is implemented once for both TaskTool and the bridge. Default mirrors task.isolation.mode: isolated by default when settings allow it, off when mode === 'none'. isolated=False explicitly disables; isolated=True with mode === 'none' errors out to match the task tool. apply=false keeps captured changes inside the worktree and surfaces the patch path / branch name in details. merge=false forces patch mode even when task.isolation.merge === 'branch'. Fixes #3196
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6a22499dcd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| return { | ||
| text: result.output, | ||
| text: result.output + mergeSummary, |
There was a problem hiding this comment.
Keep structured agent() output parseable under isolation
When a workflow calls agent(..., schema=...) while isolation is enabled, mergeIsolatedChanges() appends human-readable text like “No changes to apply.” to the subagent's JSON output here, and both JS/Python preludes parse the returned text as JSON when a schema was supplied. That makes successful structured isolated runs throw in the eval runtime instead of returning the validated object; keep merge/apply status in details or otherwise avoid appending it to schema-backed output.
Useful? React with 👍 / 👎.
| // invariant). A failed apply (`changesApplied === false`) keeps the dir so | ||
| // the caller can recover from `result.patchPath` manually. | ||
| const shouldCleanupTempArtifacts = | ||
| tempArtifactsDir && (!isIsolated || changesApplied === true || changesApplied === null); |
There was a problem hiding this comment.
Preserve temp patch artifacts when apply=false
For sessions without a session file, getArtifacts() creates a temp artifact directory; with apply=false, changesApplied intentionally stays null, so this cleanup condition deletes that temp directory before returning. In patch mode the returned details.patchPath then points at a removed file, defeating the documented/manual-apply path for agent(..., apply=False) unless the session happened to have persistent artifacts.
Useful? React with 👍 / 👎.
Kept isolation merge/apply summaries out of agent() text when a schema is supplied so Python and JS eval helpers can still parse the JSON payload. The summary now lands in details.isolationSummary for callers that need the human-readable apply state. Added a regression test that exercises an isolated schema-backed eval agent with a merge summary. Fixes #3196
|
Addressed the schema/isolation regression and pushed
|
…pply=false The cleanup gate was treating changesApplied===null (apply=false) the same as a clean apply, deleting the temp artifacts dir before returning details.patchPath. Sessions without a session file (which fall back to a per-call tmp dir) ended up with a patchPath pointing at a removed file, defeating the documented manual-apply path. Tightened the cleanup condition to remove the temp dir only on a confirmed clean apply (changesApplied===true); apply=false and failed applies both keep the artifact for the caller. Added regression tests for the apply=false preserve case and the apply-succeeds cleanup case. Fixes #3196
|
Addressed the temp-artifact cleanup regression and pushed
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9b33e0e644
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| return { | ||
| text: result.output, | ||
| text: structured ? result.output : result.output + mergeSummary, |
There was a problem hiding this comment.
Expose apply=false artifacts for structured calls
When schema and apply=False are combined, this returns only the structured JSON text while the patch/branch reference is placed in details; checked the shipped JS/Python preludes and agent() returns the parsed object, while returnHandle only includes id/agent/handle, so details.patchPath/branchName is not reachable from the workflow. In that scenario the isolation worktree is cleaned up and the caller has no artifact reference to manually apply the captured changes, which defeats apply=false for structured workflows.
Useful? React with 👍 / 👎.
… node When agent() ran with schema and apply=false, the bridge correctly returned the captured patch/branch in details, but the preludes only forwarded id/agent/handle/data on the returnHandle node. Structured workflows had no way to recover the artifact for a manual apply. Both runtimes now copy isolated, patchPath/branchName, changesApplied, and isolationSummary onto the returnHandle node (snake_case in Python, camelCase in JS), keeping null changesApplied so apply=false stays distinguishable from a successful apply. Updated the workflow notice and the Python agent() docstring to point callers at return_handle as the artifact escape hatch for isolated+apply=false runs. Added prelude tests locking the new node shape in both runtimes. Fixes #3196
|
Addressed the structured-call artifact-loss regression and pushed
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 54f4652573
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const shouldCleanupTempArtifacts = tempArtifactsDir && (!isIsolated || changesApplied === true); | ||
| if (shouldCleanupTempArtifacts) { | ||
| await fs.rm(artifactsDir, { recursive: true, force: true }); |
There was a problem hiding this comment.
Keep temp artifacts for returnHandle outputs
In sessions where getSessionFile() is null, this deletes the temp artifacts directory immediately after a non-isolated run or a successful isolated apply. That directory is where runSubprocess wrote <id>.md/<id>.jsonl, and the JS/Python preludes strip returnHandle before calling the bridge, so the bridge cannot know whether the caller will use the returned agent://<id> handle. In SDK/in-memory eval workflows using agent(..., returnHandle=True) or { returnHandle: true }, the handle is returned but agent:// resolution then fails because the backing artifacts were removed.
Useful? React with 👍 / 👎.
| if (!result.branchName || result.exitCode !== 0 || result.aborted) { | ||
| return { | ||
| summary: "\n\nNo changes to apply.", | ||
| changesApplied: true, | ||
| hadAnyChanges: false, | ||
| mergedBranchForNestedPatches: false, | ||
| }; |
There was a problem hiding this comment.
Apply nested-only changes in branch isolation
When branch-mode isolation changes only a nested git repo, commitToBranch() returns no branchName because there is no root patch, but it can still return nestedPatches. This early return treats that result as “No changes to apply” and leaves mergedBranchForNestedPatches false, so the callers' branch-mode gate never applies the nested patches; after the isolation worktree is cleaned up, those nested-repo edits are lost for task.isolation.merge="branch" runs that touch only nested repositories.
Useful? React with 👍 / 👎.
|
@korri123 could you check this? see if it makes sense & achieves what you wanted |
Eval preludes now forward returnHandle to the bridge so no-session eval runs can preserve the temp artifacts backing returned agent:// handles. The bridge keeps those temporary artifact directories whenever returnHandle is requested, including non-isolated runs and successful isolated applies. Branch-mode isolation now treats nested-only changes as merge-eligible even when no root branch was produced, letting callers apply nested patches instead of dropping them when the root repo had no diff. Added regression coverage for returnHandle artifact preservation and nested-only branch isolation. Fixes #3196
|
Addressed the two latest Codex P2 findings and pushed
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 636fbb4fcc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| return { | ||
| text: result.output, | ||
| text: structured ? result.output : result.output + mergeSummary, |
There was a problem hiding this comment.
Surface failed applies to schema callers
When an isolated schema-backed agent() run hits a patch/branch conflict (mergeIsolatedChanges() returns changesApplied === false), this returns only the raw JSON, so the JS/Python preludes parse and return the object unless returnHandle was requested; the failure warning lives only in details and is discarded by default. In that scenario the workflow observes a successful structured result while none of the edits were applied, so failed applies should throw or otherwise be surfaced to schema callers.
Useful? React with 👍 / 👎.
| } else { | ||
| mergeSummary = "\n\nIsolation: no changes captured."; |
There was a problem hiding this comment.
Preserve nested apply=false artifacts
Fresh evidence is the new apply=false branch skips mergeIsolatedChanges() and only considers branchName/patchPath: in branch mode, a nested-repo-only change has no root branch (commitToBranch() returns branchName: undefined with nestedPatches), so this path reports no changes captured and details expose no nested patch before the isolation worktree is removed. That makes agent(..., apply=False) lose nested repository edits instead of giving the caller an artifact to apply manually.
Useful? React with 👍 / 👎.
Branch-mode isolation can capture nested repository changes without creating a root branch. Eval agent() with apply=false previously treated that shape as no captured changes and returned no recoverable nested patch payload after the isolation worktree was removed. Expose captured nested patches in EvalAgentResult details and copy them onto JS/Python returnHandle nodes (nestedPatches / nested_patches). Document the return_handle escape hatch and add regression coverage for branch-mode nested-only apply=false runs. Fixes #3196
|
Addressed the nested apply=false artifact loss and pushed
|
A failed isolated apply (changesApplied === false) previously only set details.isolationSummary and returned the subagent text. Schema-backed agent() calls then parsed the JSON and returned the object, so workflows saw a successful structured result while none of the edits had landed. Throw a ToolError when mergeIsolatedChanges reports a failed apply, with the merge summary plus a recovery hint pointing at the preserved patch/branch/nested artifacts so the caller can apply manually. Added regression tests for the schema and non-schema apply-failure paths. Fixes #3196
|
Addressed the failed-apply-as-success regression and pushed
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c1b01f1cb9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (result.nestedPatches?.length) { | ||
| recoveryParts.push( | ||
| `Captured nested repository patches (${result.nestedPatches.length}) preserved in details.nestedPatches.`, | ||
| ); |
There was a problem hiding this comment.
Persist nested patches before throwing apply failures
When an isolated run changes a nested repo and the root patch/branch apply fails, this path throws after runIsolatedSubprocess() has already cleaned the isolation worktree. The thrown ToolError does not return EvalAgentResult.details, so the promised details.nestedPatches recovery path is unreachable and the nested patch text is lost; write those patches to a recoverable artifact or include them in an error payload before throwing.
Useful? React with 👍 / 👎.
| const shouldCleanupTempArtifacts = | ||
| tempArtifactsDir && !parsed.returnHandle && (!isIsolated || changesApplied === true); | ||
| if (shouldCleanupTempArtifacts) { | ||
| await fs.rm(artifactsDir, { recursive: true, force: true }); |
There was a problem hiding this comment.
Register temp artifacts before returning handles
For SDK/in-memory eval sessions with no sessionFile, preserving this temp directory is not enough to make agent://<id> work. Fresh evidence in this revision is that the fix only skips rm, while agent:// resolution still searches artifactsDirsFromRegistry() via registered sessions' getArtifactsDir()/sessionFile; both remain null for these in-memory runs, so the returned handle cannot find the .md even though the directory still exists.
Useful? React with 👍 / 👎.
When an isolated apply fails the bridge throws a ToolError and never returns details, so the nested-patch payload that previously lived in details.nestedPatches was unrecoverable after the isolation worktree was torn down. The bridge now writes each captured nested patch to a file under the per-call artifacts dir (e.g. <agentId>.nested-<index>-<slug>.patch) before throwing and includes the resolved paths in the error message so the caller can apply them manually. Added a regression test verifying the persisted file exists with the original patch contents and that the path is surfaced in the thrown error. Fixes #3196
|
Addressed the lost-nested-patches regression and pushed
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a48f1ea813
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } catch { | ||
| // Nested patch failures are non-fatal to the parent merge | ||
| mergeSummary += | ||
| "\n\n<system-notification>Some nested repository patches failed to apply.</system-notification>"; |
There was a problem hiding this comment.
Surface nested apply failures to schema callers
When an isolated schema-backed agent() changes a nested repo and applyNestedPatches() fails (for example, a nested patch conflict), this catch only appends a warning to mergeSummary. The returned text remains the raw structured JSON, and the JS/Python helpers return the parsed object unless returnHandle was requested, so the workflow observes success even though those nested edits were not applied; this nested-repo failure path should throw or otherwise make schema callers observe the failed apply.
Useful? React with 👍 / 👎.
|
@roboomp This seems to make eval |
|
@korri123 the current default was implemented per @can1357's explicit answer in this earlier comment to my open question 1 ("Should Flipping to strict opt-in is a one-line change in |
Per maintainer ruling on #3196, eval agent() now defaults to non-isolated regardless of task.isolation.mode, mirroring the task tool. isolated=true is the only way to turn it on; isolated=true while task.isolation.mode === "none" still throws the same clear error. Updated tests, workflow-notice.md, and Python agent() docstring to reflect the strict opt-in contract. Existing isolation tests now pass isolated:true explicitly; the inherit-from-settings assertion is replaced with a default-off + isolated=true opt-in regression. Fixes #3196
|
Per @can1357's ruling, flipped eval
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7438efc5b3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| outputSchema: structured ? parsed.schema : undefined, | ||
| sessionFile, | ||
| persistArtifacts: Boolean(sessionFile), | ||
| const result = await withBridgeTimeoutPause(options.emitStatus, async () => { |
There was a problem hiding this comment.
Keep timeout paused through isolation apply
For agent(..., isolated=True), this wrapper emits timeout-resume as soon as runIsolatedSubprocess returns, but the isolation merge/apply work below (mergeIsolatedChanges and applyNestedPatches) still runs before control returns to the JS/Python runtime. When the eval idle watchdog is wired and a cherry-pick, large patch apply, or nested commit-message generation takes longer than the cell timeout, the cell can be aborted during successful post-processing; keep the pause around the whole bridge operation, not just the subagent subprocess.
Useful? React with 👍 / 👎.
Previously withBridgeTimeoutPause only wrapped the subagent subprocess; mergeIsolatedChanges, applyNestedPatches, nested commit-message generation, and artifact cleanup ran with the eval watchdog re-armed. A cherry-pick or large patch apply could trip the cell timeout and abort successful post-processing. Moved the entire bridge work (subprocess + merge + nested apply + cleanup + usage recording) inside one withBridgeTimeoutPause block. The pause helper still resumes via its finally on success and on throw, so existing failure paths are unchanged. Added a regression that captures the emitted op order and asserts merge fires after timeout-pause and before timeout-resume. Fixes #3196
|
Addressed the timeout-resume-mid-merge regression and pushed
|
|
@roboomp Please resolve the duplication and try to refactor it a bit so that there is less risk of implementation drift. The clearest duplication is the nested-patch apply block. In // Apply nested repo patches (separate from parent git)
if (isIsolated && repoRoot && (mergeMode === "branch" || changesApplied !== false)) {
const nestedPatches = result.nestedPatches ?? [];
const eligible =
nestedPatches.length > 0 &&
result.exitCode === 0 &&
!result.aborted &&
(mergeMode !== "branch" || mergedBranchForNestedPatches);
if (eligible) {
try {
await applyNestedPatches(repoRoot, nestedPatches, buildCommitMessageFn());
} catch {
// Nested patch failures are non-fatal to the parent merge
mergeSummary +=
"\n\n<system-notification>Some nested repository patches failed to apply.</system-notification>";
}
}
}And in // Apply nested repo patches (separate from parent git). The throw
// above already exited on a failed parent merge, so we know either
// the parent succeeded (patch mode) or branch mode is in play.
const nestedPatches = result.nestedPatches ?? [];
const eligible =
nestedPatches.length > 0 &&
result.exitCode === 0 &&
!result.aborted &&
(mergeMode !== "branch" || outcome.mergedBranchForNestedPatches);
if (eligible) {
try {
await applyNestedPatches(isolationContext.repoRoot, nestedPatches, buildCommitMessage());
} catch {
// Nested patch failures are non-fatal to the parent merge
mergeSummary +=
"\n\n<system-notification>Some nested repository patches failed to apply.</system-notification>";
}
}Same core contract is duplicated: const nestedPatches = result.nestedPatches ?? [];
const eligible =
nestedPatches.length > 0 &&
result.exitCode === 0 &&
!result.aborted &&
(mergeMode !== "branch" || /* branch/root merge succeeded */);
if (eligible) {
try {
await applyNestedPatches(...);
} catch {
mergeSummary +=
"\n\n<system-notification>Some nested repository patches failed to apply.</system-notification>";
}
}There is also smaller duplication in the AI commit-message callback construction. In const buildCommitMessageFn = () =>
commitStyle === "ai" && this.session.modelRegistry
? async (diff: string) => {
return generateCommitMessage(
diff,
this.session.modelRegistry!,
this.session.settings,
this.session.getSessionId?.() ?? undefined,
);
}
: undefined;In const buildCommitMessage = () =>
commitStyle === "ai" && options.session.modelRegistry
? async (diff: string) => {
return generateCommitMessage(
diff,
options.session.modelRegistry!,
options.session.settings,
options.session.getSessionId?.() ?? undefined,
);
}
: undefined;That one is less concerning, but still a sign that isolation support is split across caller code instead of fully centralized. Look for other places relevant only to this PR's changes and try to make it nice, clean and DRY. |
TaskTool and the eval agent() bridge each held a private copy of the nested-repo patch eligibility gate and the AI commit-message factory; isolation policy could drift between the two callers. Moved both into task/isolation-runner.ts: - applyEligibleNestedPatches(opts) — single nested-patch gate (skip on patch-mode parent failure, skip on branch-mode unmerged root, fail non-fatally with a system-notification suffix). - makeIsolationCommitMessage(session) — single factory that yields the AI commit-message callback when task.isolation.commits === "ai" and a model registry is wired, undefined otherwise. Both call sites now invoke the helpers; behavior is unchanged. Removed the now-dead generateCommitMessage/applyNestedPatches imports from each caller. Added unit tests for the new helper covering the skip-on-patch-failure, skip-on-unmerged-branch, success, and failure-suffix paths. Fixes #3196
|
Addressed the duplication and pushed
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 28137f46d9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| mergeSummary += await applyEligibleNestedPatches({ | ||
| result, | ||
| repoRoot: isolationContext.repoRoot, | ||
| mergeMode, | ||
| changesApplied: outcome.changesApplied, | ||
| mergedBranchForNestedPatches: outcome.mergedBranchForNestedPatches, | ||
| commitMessage: buildCommitMessage(), | ||
| }); |
There was a problem hiding this comment.
Avoid committing dirty nested-repo state
When the new eval agent(isolated=True) path applies nested-repo patches here, a nested repo that was already dirty before the agent ran can have unrelated user edits committed along with the agent delta: applyNestedPatches() applies the captured patch, then checks git.status() and runs git.stage.files(nestedDir), which stages every existing change in that nested repo. In that scenario an isolated eval fan-out can silently commit pre-existing nested-repo work; preserve/restore the baseline dirty state or stage only the files/hunks introduced by result.nestedPatches.
Useful? React with 👍 / 👎.
applyNestedPatches() applied the captured patch then ran git.stage.files(nestedDir), which stages every working-tree change in the nested repo. A nested repo that was already dirty before the agent ran ended up with the user's unrelated work-in-progress committed alongside the agent delta. Stash any pre-existing dirty state (tracked + untracked) before applying the patch and pop it back in the finally block after the commit, so the agent commit contains only the captured patch and the user's in-flight work is restored on top of it. A failing stash pop logs a warning and leaves the stash entry intact for manual recovery; the broader nested-apply failure path is already non-fatal. Added a worktree integration test that confirms a pre-existing untracked file in the nested repo is not staged into the agent commit and is still present in the working tree afterwards. Fixes #3196
|
Addressed the dirty-nested-repo regression and pushed
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 978d2a76d0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let isolationContext: IsolationContext | null = null; | ||
| if (isIsolated) { | ||
| try { | ||
| isolationContext = await prepareIsolationContext(options.session.cwd); |
There was a problem hiding this comment.
Move timeout pause before isolation baseline capture
For agent(..., isolated=True), prepareIsolationContext() runs before withBridgeTimeoutPause() emits timeout-pause. captureBaseline() can walk nested repos and build diffs for untracked files, so in large or dirty repos this host-side bridge work can exceed the eval idle timeout while the Python/JS runtime is blocked waiting and the watchdog is still armed. Move the pause to cover isolation preparation as well, not just subprocess/merge.
Useful? React with 👍 / 👎.
| } catch (popErr) { | ||
| logger.warn("Pre-existing nested-repo dirty state could not be auto-restored", { | ||
| nestedDir, | ||
| error: popErr instanceof Error ? popErr.message : String(popErr), | ||
| }); |
There was a problem hiding this comment.
Surface nested stash-restore failures
Fresh evidence in this revision is that this new catch swallows popErr after only logging. When git stash pop conflicts after the nested patch commit (for example, pre-existing nested-repo WIP touched the same lines the agent changed), applyNestedPatches() still resolves and applyEligibleNestedPatches() reports success, while the user's WIP is left in a stash/conflicted state with no workflow-visible warning; propagate this failure or return a summary like the root merge path does.
Useful? React with 👍 / 👎.
| } finally { | ||
| if (stashed) { | ||
| try { | ||
| await git.stash.pop(nestedDir); |
There was a problem hiding this comment.
Restore nested stashes with index state
Checked git stash -h: stash pop has an explicit --index mode; this call omits it. When a nested repo has pre-existing staged WIP before an isolated subagent runs, the stash is popped after the agent commit as ordinary worktree changes, losing the user's index/staging state even though the content is restored; use git.stash.pop(nestedDir, { index: true }) as the root merge path already does.
Useful? React with 👍 / 👎.
|
lgtm, built and tested this and it seems to work as intended |
git stash pop without --index restores stashed staged changes as unstaged. When a nested repo had staged WIP before the isolated agent ran, the pop in applyNestedPatches() brought the content back but lost the user's index state.
Pass { index: true } so pop uses --index, matching the root merge path that already does the same thing.
Added a regression test that stages a pre-existing edit in the nested repo, runs applyNestedPatches, and asserts the file is still in the index (porcelain "M " with the trailing space) and the cached diff still shows the staged WIP.
Fixes #3196
|
Addressed the lost-index-state regression and pushed
|
…tash-restore failures Two Codex P2 findings landed against 978d2a7 that were not in the previously delivered review event: 1) prepareIsolationContext() (which runs captureBaseline → walks nested repos and untracked diffs) was running OUTSIDE withBridgeTimeoutPause; on dirty/large repos the baseline walk can exceed the eval idle timeout while the runtime is blocked. Moved the prep call into the pause closure so the watchdog is suspended for the whole bridge call from prep through cleanup. 2) applyNestedPatches() swallowed git stash pop failures with only a logger.warn, so a stash-pop conflict after a successful agent commit was invisible to the workflow. Changed the helper to return Promise<string[]> of warnings; applyEligibleNestedPatches now wraps them in a <system-notification> appended to the merge summary so the caller actually sees the partial-success case. Added regression tests: - bridge: prepare fires after timeout-pause and before timeout-resume. - runner: applyEligibleNestedPatches surfaces stash-restore warnings as a system-notification. - worktree (real git): a pre-existing dirty edit on the same file the agent patches causes stash pop to conflict; the helper returns a warning naming the nested repo and the stash entry is preserved for manual recovery. Fixes #3196
|
Thanks @korri123 for the LGTM. Before yielding, picked up two Codex P2 findings against
|
|
This was addressed in |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ee79ec3b6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const commitResult = await commitToBranch( | ||
| isolationDir, | ||
| taskBaseline, | ||
| opts.agentId, | ||
| opts.description, | ||
| opts.buildCommitMessage?.(), | ||
| ); |
There was a problem hiding this comment.
Serialize branch capture before parallel branch-mode runs
When eval workflows fan out multiple agent(..., isolated=True) calls in branch merge mode, each run can reach commitToBranch() concurrently. That helper creates/deletes branches and worktrees in the same parent repo before the later mergeTaskBranches() lock is acquired, so otherwise successful subagents can fail on Git lock/worktree races during branch capture rather than during the serialized merge phase. Wrap the branch-capture call in the repo write lock as well.
Useful? React with 👍 / 👎.
| }); | ||
| mergeSummary = outcome.summary; | ||
| changesApplied = outcome.changesApplied; | ||
| if (outcome.changesApplied === false) { |
There was a problem hiding this comment.
Surface stash-restore conflicts to schema callers
In branch mode with a dirty parent worktree, mergeTaskBranches() can cherry-pick successfully but fail to pop the user's stash, returning a stashConflict warning while changesApplied remains true. This guard therefore does not throw, and schema-backed JS/Python agent() calls return only the parsed JSON while discarding details.isolationSummary, so the workflow observes success even though pre-existing edits were left in a stash/conflicted state. Treat that warning as workflow-visible for structured calls.
Useful? React with 👍 / 👎.
…r agent() helper Resolves conflict in test/task/worktree.test.ts by keeping both the getRepoRoot (main) and applyNestedPatches (PR) describe blocks. Extends the PR's Python/JS work to the remaining workflow runtimes: - eval/rb/prelude.rb, eval/jl/prelude.jl: agent() now accepts and forwards isolated/apply/merge (as booleans) plus returnHandle, and the return_handle node carries isolated/patch_path/branch_name/ nested_patches/changes_applied/isolation_summary. Post-merge fixups: - task/index.ts: drop dead commitStyle var (the dedup refactor reads task.isolation.commits inside makeIsolationCommitMessage). - CHANGELOG: move the misplaced Added entry under [Unreleased], correct the stale "defaults track task.isolation.mode" wording to the final strict opt-in behavior, and note all four runtimes. Fixes #3196
Repro
In a workspace with
task.isolation.mode !== "none", dispatching parallelagent()calls from aworkflowzeval cell — e.g.all run inside the parent worktree. The
tasktool'sisolated: truewrapper isn't reachable from the eval bridge, so overlapping edits across fan-out items race in the same working tree, which defeats the point of usingworkflowzfor safe parallel migrations/audits.Cause
packages/coding-agent/src/eval/agent-bridge.tsaccepts onlyprompt,agentType,model,label, andschema, then callstaskExecutor.runSubprocess(...)directly — it never touches the isolation lifecycle thatTaskTool.#runSpawnruns inpackages/coding-agent/src/task/index.ts(repo baseline capture →ensureIsolation(...)→runSubprocess({ worktree: handle.mergedDir, ... })→ patch/branch merge →cleanupIsolation(...)). The Python and JS preludes also never expose or forward anisolatedoption.Fix
packages/coding-agent/src/task/isolation-runner.ts(new): extractedprepareIsolationContext,runIsolatedSubprocess,mergeIsolatedChangesso the eval bridge andTaskTool.#runSpawnshare one lifecycle implementation.TaskToolis refactored to consume the helpers — behavior unchanged.packages/coding-agent/src/eval/agent-bridge.ts: addedisolated,apply,mergeto the arktype schema. Default trackstask.isolation.mode(isolated when not"none", off when"none");isolated=Truewhiletask.isolation.mode === "none"throws the same clear error as thetasktool;apply=falseskips the merge phase and surfaces the patch path / branch name indetails;merge=falseforces patch mode even whentask.isolation.merge === "branch".packages/coding-agent/src/eval/py/prelude.py,packages/coding-agent/src/eval/js/shared/prelude.txt: bothagent()signatures accept and forwardisolated,apply,merge. JS supports both options-object (agent(prompt, { isolated: true })) and positional forms.packages/coding-agent/src/prompts/system/workflow-notice.md: documentedisolated=True/False,apply=False,merge=False.packages/coding-agent/src/eval/__tests__/agent-bridge.test.ts: 5 new tests for disabled-isolation rejection, default-from-settings, explicitisolated=false,merge: false→ patch mode + worktree cwd plumbing,apply: falseskipping merge and surfacing patch/branch.Verification
bun test packages/coding-agent/src/eval/__tests__/agent-bridge.test.ts— 22/22 pass (5 new isolation tests included).bun test packages/coding-agent/test/task— 160/160 pass (regression coverage for the refactoredTaskToolisolation lifecycle:subagent-lsp.test.ts,task-batch.test.ts,render-call.test.ts,worktree.test.ts,executor-pass-through.test.ts).bun test packages/coding-agent/src/eval/__tests__/prelude-agent.test.ts packages/coding-agent/src/eval/py/__tests__/prelude.test.ts— 5/5 pass.bun test packages/coding-agent/test/modes/workflow.test.ts packages/coding-agent/test/agent-session-magic-keywords.test.ts— 10/10 pass.Fixes #3196