fix(compaction): trigger /goal threshold on billed context, not post-prune estimate#3175
Conversation
…ost-prune estimate Pruning frees bytes for the NEXT prompt — it does not change the size of the prompt the LLM just billed for. Subtracting the per-turn `#pruneStaleToolResults` / `#pruneToolOutputs` savings from the threshold input let a long-running `/goal` session sit above `compaction.thresholdTokens` indefinitely: the visible context (anchored to the same provider billing) showed >threshold, but `shouldCompact` no-op'd because the subtraction dropped the input below the trigger. The `compactionContextTokens` floor against the post-prune local estimate is still applied, so a payload-compression hook still can't deflate the trigger. Regression test seeds one large `useless` tool result whose suffix sits inside the 8k cache-warm window so `#pruneStaleToolResults` actually returns ≥20k savings, then asserts compaction fires when the final turn bills 91k tokens against the reporter's `thresholdTokens: 76384`. Fixes #3174
There was a problem hiding this comment.
💡 Codex Review
When compaction.strategy === "shake" and the prune passes above reclaim a large useless/superseded tail, this now passes the pre-prune billed total into #runAutoShake as triggerContextTokens. That fallback check subtracts only shake's own tokensFreed, so it can conclude the session is still over the recovery band and fall through to context-full LLM compaction even when the post-prune history is already under the threshold. Keep using billed tokens for the initial trigger, but pass a post-prune-adjusted value (or include the prune savings) for the shake post-maintenance check.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@roboomp The session still goes beyond the configured threshold: compaction:
enabled: true
strategy: context-full
thresholdTokens: 76384
thresholdPercent: -1
keepRecentTokens: 10000
reserveTokens: 16384
autoContinue: true
supersedeReads: true
dropUseless: true
idleEnabled: false
remoteEnabled: falseVisible context/token usage is still above 90k, but automatic compaction does not appear to run. So the post-prune token subtraction fixed in #3175 does not seem to be the only remaining issue, at least in my real I think we need more runtime diagnostics around the actual decision path. In particular, it would help to know whether the real session is reaching:
A useful debug log/event would be something like: Since |
…tions Active goal turns that stopped with text could hit the empty/unexpected-stop continuation guards before threshold maintenance. When those guards scheduled another goal turn, #checkCompaction never ran, so no auto_compaction_start was emitted even while visible context stayed above thresholdTokens. Run threshold maintenance once before those active-goal self-continuations and log the threshold decision inputs: billed context, stored estimate, resolved trigger tokens, post-maintenance tokens, strategy, threshold, promotion state, and shouldCompact. Also pass post-prune maintenance tokens into the shake recovery-band check so supersede/drop-useless savings are preserved when deciding whether shake still needs to fall back to context-full compaction. Fixes #3174
|
Updated #3175 with two follow-up fixes after your real-session repro and Codex's review. What changed:
Verification: The branch has been pushed again through the host gate. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d444b3c479
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (compactionResult.deferredHandoff || compactionResult.continuationScheduled) { | ||
| await emitAgentEndNotification(); | ||
| return; |
There was a problem hiding this comment.
Keep empty tool-use cleanup before compaction continuations
When an active goal hits the compaction threshold and the assistant returned an empty toolUse stop, this early return skips #handleEmptyAssistantStop; that handler is the only path that removes the orphaned tool-use assistant from active context/session history before retrying. With the default compaction auto-continue path, the next turn can be scheduled with a toolUse assistant that has no tool call, which the existing cleanup comment notes corrupts Anthropic message history. Run the empty-stop cleanup before returning, or avoid the early return for empty tool-use stops.
Useful? React with 👍 / 👎.
…n continuation Codex review on #3175: the active-goal compaction pre-empt I added in 8ab754f short-circuited #handleEmptyAssistantStop. That handler is the only path that strips an orphan toolUse assistant (stopReason "toolUse" with no toolCall block) from both active context and the session branch via #removeEmptyStopFromActiveContext. With the pre-empt ordering, an over-threshold goal turn that returned an empty toolUse left the orphan as the session leaf, and the compaction auto-continue prompt fed it back into the next Anthropic turn as a tool_use with no matching tool_result — the exact history-corruption pattern the existing cleanup comment defends against. Move #handleEmptyAssistantStop back ahead of the active-goal compaction probe. Empty stops still self-retry and never reach the threshold pre-empt; non-empty stops (the reporter's failure in #3174) still hit threshold maintenance before the unexpected-stop classifier. Regression test seeds a goal-mode empty toolUse stop billed at 91k against thresholdTokens 76384 and asserts the threshold compaction never starts and the orphan is no longer in the session branch. Fixes #3174
|
Addressed Codex's P2 in What changed:
Verification: |
Reporter on #3174 still sees no auto-compaction with thresholdTokens lowered to 32768 against a 70k+ visible context, and reports the existing `Auto-compaction threshold decision` log never appears. Either the goal turn never reaches `#checkCompaction`, or the log fires but is filtered out of their view (winston is at debug level so it writes to ~/.omp/logs/omp.<DATE>.log, not the TUI). Add an `agent_end maintenance routing` debug log at every branch of the `agent_end` handler — entered/no-message, skip-post-turn-maintenance, successful-yield (active goal vs not), empty-stop-handled, active-goal pre-empt (and whether it scheduled a continuation), unexpected-stop-handled, and bottom checkCompaction — together with stopReason, provider/model, content shape, goal state, and `successfulYield`. Combined with the existing `Auto-compaction threshold decision` log, the next no-start report identifies the exact early-return branch and the inputs that fed `shouldCompact`. Refs #3174
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e70c71077e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (compactionResult.deferredHandoff || compactionResult.continuationScheduled) { | ||
| maintenanceRoute("active-goal-pre-empt-continuation-scheduled", { | ||
| deferredHandoff: compactionResult.deferredHandoff, | ||
| continuationScheduled: compactionResult.continuationScheduled, | ||
| }); | ||
| await emitAgentEndNotification(); | ||
| return; |
There was a problem hiding this comment.
Resolve retry before active-goal compaction returns
When an active /goal retry succeeds with a text stop that is already over the compaction threshold, this new pre-emption path can return before the later #resolveRetry() call. The retry promise created by #handleRetryableError then stays pending; for example with deferred handoff or compaction.autoContinue=false, the compaction task may finish without any later agent_end to clear it, leaving prompt()/waitForIdle() stuck with isRetrying still true. Resolve the retry before returning from this compaction-continuation branch.
Useful? React with 👍 / 👎.
Active-goal threshold compaction can pre-empt the normal post-turn tail and return once it schedules a deferred handoff or auto-continue. When that turn is the successful response from an auto-retry, returning there skips the later retry-gate cleanup and leaves isRetrying stuck. Resolve the completed retry gate before the compaction-continuation return, and cover the retry-success-over-threshold path so future changes cannot strand prompt()/waitForIdle() behind a stale retry state. Refs #3174
|
Active goal loops can stay inside one agent run while the model keeps emitting tool calls, so the normal agent_end threshold maintenance never runs. That lets context grow past the soft threshold until provider overflow or user abort. Run threshold maintenance from the per-turn onTurnEnd hook for active goals, splice the compacted agent state back into the live loop message array, and suppress queued continuations because the current run is already continuing. Cover the mid-run tool-call path and the non-goal control case. Refs #3174
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29e69f5364
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!shouldCompact(contextTokens, contextWindow, compactionSettings)) return; | ||
|
|
||
| const messagesBefore = activeMessages.length; | ||
| await this.#runAutoCompaction("threshold", false, false, false, { |
There was a problem hiding this comment.
Persist the just-finished turn before mid-run compaction
When an active /goal crosses the threshold between tool turns while a message_end extension hook is still running, this call can compact before the just-finished assistant/tool-result messages have been appended to sessionManager: Agent.#emit does not await #processAgentEvent, and persistence happens after #emitSessionEvent. #runAutoCompaction then prepares from sessionManager.getBranch(), so the summary can omit the tool result that activeMessages already contains; after compaction, the splice below replaces the live loop context with that stale compacted state, and the next model call can lose the tool output it was supposed to see. Ensure the current turn is persisted or compact from the active message array before running this mid-run path.
Useful? React with 👍 / 👎.
Repro
Goal mode session with the reporter's config:
Long-running
/goalsession grew past 90k tokens of visible context. Thestatus-line context (anchored to the last assistant's billed prompt)
showed >
compaction.thresholdTokens, but auto-compaction never fired.Manual
/compactstill worked. The reporter also verified that forcing#checkCompaction(...)on the successful-yield branch did not change theoutcome, so the early-return fix from #3147 was not the only path
involved.
Cause
AgentSession.#checkCompactionran the per-turn supersede +drop-useless prune passes, then subtracted their
tokensSavedfromcalculateContextTokens(assistantMessage.usage)before callingshouldCompact. Pruning frees bytes for the next prompt; it doesnot change the size of the prompt the LLM just billed for. When per-turn
pruning saved ≥
billedContext - thresholdTokenstokens (triviallyreachable for a goal loop that keeps feeding the agent large tool
outputs), the subtraction dropped the threshold input below
compaction.thresholdTokenseven while the user-visible context(anchored to the same provider billing) sat well above it. The trigger
silently no-op'd and the loop kept running.
Fix
packages/coding-agent/src/session/agent-session.ts:#checkCompactionnow feeds
shouldCompactthe actual last-turn billed context(
calculateContextTokens(assistantMessage.usage)), floored by thepost-prune
#estimateStoredContextTokens()so a payload-compressionhook (Headroom / inline snapcompact / obfuscator) still cannot deflate
the trigger. The supersede + age-prune passes still run for their
side-effects (mutating the next prompt's tool-result tail), their
tokensSavedis just no longer subtracted from the threshold input.Verification
packages/coding-agent/test/agent-session-auto-compaction-queue.test.tsseeds one large
useless: truetool result whose suffix sits inside the8k cache-warm window so
#pruneStaleToolResultsreturns ≥20k savings,then asserts compaction fires when the final turn bills 91k tokens
against the reporter's
thresholdTokens: 76384. The new test failsagainst the pre-fix code (no
compaction:start:thresholdsignal) andpasses with the patch.
Fixes #3174