Skip to content

fix(compaction): trigger /goal threshold on billed context, not post-prune estimate#3175

Merged
can1357 merged 8 commits into
mainfrom
farm/fc89a4e3/goal-auto-compaction-still-not-triggering
Jun 23, 2026
Merged

fix(compaction): trigger /goal threshold on billed context, not post-prune estimate#3175
can1357 merged 8 commits into
mainfrom
farm/fc89a4e3/goal-auto-compaction-still-not-triggering

Conversation

@roboomp

@roboomp roboomp commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Repro

Goal mode session with the reporter's config:

compaction:
  enabled: true
  strategy: context-full
  thresholdTokens: 76384
  supersedeReads: true
  dropUseless: true

Long-running /goal session grew past 90k tokens of visible context. The
status-line context (anchored to the last assistant's billed prompt)
showed >compaction.thresholdTokens, but auto-compaction never fired.
Manual /compact still worked. The reporter also verified that forcing
#checkCompaction(...) on the successful-yield branch did not change the
outcome, so the early-return fix from #3147 was not the only path
involved.

Cause

AgentSession.#checkCompaction ran the per-turn supersede +
drop-useless prune passes, then subtracted their tokensSaved from
calculateContextTokens(assistantMessage.usage) before calling
shouldCompact. Pruning frees bytes for the next prompt; it does
not change the size of the prompt the LLM just billed for. When per-turn
pruning saved ≥ billedContext - thresholdTokens tokens (trivially
reachable for a goal loop that keeps feeding the agent large tool
outputs), the subtraction dropped the threshold input below
compaction.thresholdTokens even while the user-visible context
(anchored to the same provider billing) sat well above it. The trigger
silently no-op'd and the loop kept running.

Fix

packages/coding-agent/src/session/agent-session.ts:#checkCompaction
now feeds shouldCompact the actual last-turn billed context
(calculateContextTokens(assistantMessage.usage)), floored by the
post-prune #estimateStoredContextTokens() so a payload-compression
hook (Headroom / inline snapcompact / obfuscator) still cannot deflate
the trigger. The supersede + age-prune passes still run for their
side-effects (mutating the next prompt's tool-result tail), their
tokensSaved is just no longer subtracted from the threshold input.

Verification

packages/coding-agent/test/agent-session-auto-compaction-queue.test.ts
seeds one large useless: true tool result whose suffix sits inside the
8k cache-warm window so #pruneStaleToolResults returns ≥20k savings,
then asserts compaction fires when the final turn bills 91k tokens
against the reporter's thresholdTokens: 76384. The new test fails
against the pre-fix code (no compaction:start:threshold signal) and
passes with the patch.

bun --cwd packages/coding-agent test test/agent-session-auto-compaction-queue.test.ts
# 5 pass, 0 fail
bun --cwd packages/coding-agent test \
  test/agent-session-auto-compaction-queue.test.ts \
  test/agent-session-compaction.test.ts \
  test/agent-session-eager-compaction.test.ts \
  test/agent-session-context-promotion.test.ts \
  test/agent-session-plan-reference-compaction.test.ts \
  test/issue-986-compaction-auth-fallback.test.ts \
  test/compaction-lifecycle.test.ts \
  test/compaction.test.ts \
  test/compaction-prefer-current-model.test.ts \
  test/compaction-serialization.test.ts \
  test/compaction-hooks.test.ts \
  test/compaction-thinking-model.test.ts \
  test/input-controller-compaction-image.test.ts \
  test/append-only-context-mode.test.ts \
  test/context-consolidation.test.ts
# 102 pass, 13 skip, 0 fail
bun --cwd packages/agent test \
  test/supersede-prune.test.ts \
  test/normalize-tools-prune.test.ts \
  test/compaction-telemetry.test.ts \
  test/compaction-thinking-level.test.ts \
  test/compaction-file-ops.test.ts
# 65 pass, 0 fail

Fixes #3174

…ost-prune estimate

Pruning frees bytes for the NEXT prompt — it does not change the size of
the prompt the LLM just billed for. Subtracting the per-turn
`#pruneStaleToolResults` / `#pruneToolOutputs` savings from the
threshold input let a long-running `/goal` session sit above
`compaction.thresholdTokens` indefinitely: the visible context
(anchored to the same provider billing) showed >threshold, but
`shouldCompact` no-op'd because the subtraction dropped the input below
the trigger. The `compactionContextTokens` floor against the post-prune
local estimate is still applied, so a payload-compression hook still
can't deflate the trigger.

Regression test seeds one large `useless` tool result whose suffix sits
inside the 8k cache-warm window so `#pruneStaleToolResults` actually
returns ≥20k savings, then asserts compaction fires when the final turn
bills 91k tokens against the reporter's `thresholdTokens: 76384`.

Fixes #3174

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

triggerContextTokens: contextTokens,

P2 Badge Preserve prune savings for shake fallback

When compaction.strategy === "shake" and the prune passes above reclaim a large useless/superseded tail, this now passes the pre-prune billed total into #runAutoShake as triggerContextTokens. That fallback check subtracts only shake's own tokensFreed, so it can conclude the session is still over the recovery band and fall through to context-full LLM compaction even when the post-prune history is already under the threshold. Keep using billed tokens for the initial trigger, but pass a post-prune-adjusted value (or include the prune savings) for the shake post-maintenance check.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@luceat-lux-vestra

luceat-lux-vestra commented Jun 21, 2026

Copy link
Copy Markdown

@roboomp
I built and tested the farm/fc89a4e3/goal-auto-compaction-still-not-triggering branch, including commit b74f7bb, but the issue still reproduces in my real /goal session.

The session still goes beyond the configured threshold:

compaction:
  enabled: true
  strategy: context-full
  thresholdTokens: 76384
  thresholdPercent: -1
  keepRecentTokens: 10000
  reserveTokens: 16384
  autoContinue: true
  supersedeReads: true
  dropUseless: true
  idleEnabled: false
  remoteEnabled: false

Visible context/token usage is still above 90k, but automatic compaction does not appear to run.

So the post-prune token subtraction fixed in #3175 does not seem to be the only remaining issue, at least in my real /goal run.

I think we need more runtime diagnostics around the actual decision path. In particular, it would help to know whether the real session is reaching:

  1. the successful-yield branch,
  2. #checkCompaction(...),
  3. shouldCompact(...),
  4. #tryContextPromotion(...),
  5. #runAutoCompaction(...),
  6. prepareCompaction(...).

A useful debug log/event would be something like:

auto-compaction decision:
  phase=post-agent-end
  goalMode.enabled=...
  goal.status=...
  successfulYield=...
  stopReason=...
  sameModel=...
  skipPostTurnMaintenance=...
  contextWindow=...
  thresholdTokens=...
  assistantUsageContextTokens=...
  storedContextTokens=...
  resolvedContextTokens=...
  shouldCompact=...
  contextPromotion.enabled=...
  promoted=...
  runAutoCompactionEntered=...
  prepareCompactionAvailable=...
  skipReason=...

Since #runAutoCompaction() emits auto_compaction_start immediately after installing the controller, if no auto_compaction_start is visible then the real problem is likely before #runAutoCompaction() — either the /goal path is not calling #checkCompaction(), the threshold decision is still false, context promotion is taking over, or some post-turn maintenance guard is returning early.

roboomp added 2 commits June 21, 2026 08:35
…tions

Active goal turns that stopped with text could hit the empty/unexpected-stop
continuation guards before threshold maintenance. When those guards scheduled
another goal turn, #checkCompaction never ran, so no auto_compaction_start was
emitted even while visible context stayed above thresholdTokens.

Run threshold maintenance once before those active-goal self-continuations and
log the threshold decision inputs: billed context, stored estimate, resolved
trigger tokens, post-maintenance tokens, strategy, threshold, promotion state,
and shouldCompact.

Also pass post-prune maintenance tokens into the shake recovery-band check so
supersede/drop-useless savings are preserved when deciding whether shake still
needs to fall back to context-full compaction.

Fixes #3174
@roboomp

roboomp commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

Updated #3175 with two follow-up fixes after your real-session repro and Codex's review.

What changed:

  • Active /goal text stops now run threshold maintenance before empty/unexpected-stop retry continuations can return from agent_end. I reproduced the old behavior with a test where features.unexpectedStopDetection classified the stop as unexpected: pre-fix there was no compaction:start:threshold; post-fix compaction starts first.
  • The threshold path now logs Auto-compaction threshold decision with strategy, stop reason, context window, configured threshold, billed context tokens, stored estimate, resolved trigger tokens, post-maintenance tokens, prune savings, shouldCompact, and context-promotion state.
  • Addressed Codex's shake review: shake fallback now receives post-prune maintenance tokens, so supersede/drop-useless savings are preserved when deciding whether to fall back to context-full.

Verification:

bun test test/agent-session-auto-compaction-queue.test.ts -t "unexpected-stop"
# 1 pass, 0 fail
bun test test/shake.test.ts -t "pre-shake prune"
# 1 pass, 0 fail
bun test test/agent-session-auto-compaction-queue.test.ts test/shake.test.ts test/agent-session-compaction.test.ts test/agent-session-eager-compaction.test.ts test/agent-session-context-promotion.test.ts test/compaction-lifecycle.test.ts test/compaction.test.ts test/compaction-prefer-current-model.test.ts test/compaction-serialization.test.ts test/compaction-hooks.test.ts test/compaction-thinking-model.test.ts test/append-only-context-mode.test.ts test/context-consolidation.test.ts
# 95 pass, 13 skip, 0 fail

The branch has been pushed again through the host gate.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d444b3c479

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2813 to +2815
if (compactionResult.deferredHandoff || compactionResult.continuationScheduled) {
await emitAgentEndNotification();
return;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep empty tool-use cleanup before compaction continuations

When an active goal hits the compaction threshold and the assistant returned an empty toolUse stop, this early return skips #handleEmptyAssistantStop; that handler is the only path that removes the orphaned tool-use assistant from active context/session history before retrying. With the default compaction auto-continue path, the next turn can be scheduled with a toolUse assistant that has no tool call, which the existing cleanup comment notes corrupts Anthropic message history. Run the empty-stop cleanup before returning, or avoid the early return for empty tool-use stops.

Useful? React with 👍 / 👎.

…n continuation

Codex review on #3175: the active-goal compaction pre-empt I added in
8ab754f short-circuited #handleEmptyAssistantStop. That handler is
the only path that strips an orphan toolUse assistant (stopReason
"toolUse" with no toolCall block) from both active context and the
session branch via #removeEmptyStopFromActiveContext. With the pre-empt
ordering, an over-threshold goal turn that returned an empty toolUse
left the orphan as the session leaf, and the compaction auto-continue
prompt fed it back into the next Anthropic turn as a tool_use with no
matching tool_result — the exact history-corruption pattern the
existing cleanup comment defends against.

Move #handleEmptyAssistantStop back ahead of the active-goal
compaction probe. Empty stops still self-retry and never reach the
threshold pre-empt; non-empty stops (the reporter's failure in #3174)
still hit threshold maintenance before the unexpected-stop classifier.

Regression test seeds a goal-mode empty toolUse stop billed at 91k
against thresholdTokens 76384 and asserts the threshold compaction
never starts and the orphan is no longer in the session branch.

Fixes #3174
@roboomp

roboomp commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed Codex's P2 in 00d14accfb.

What changed:

  • Moved #handleEmptyAssistantStop back ahead of the active-goal compaction probe in agent_end, so an empty toolUse stop still runs #removeEmptyStopFromActiveContext (strips the orphan from active context + session branch) before any compaction continuation. Non-empty stops still hit threshold maintenance before the unexpected-stop classifier — the reporter's failure path is unchanged.
  • Added a regression test that seeds an empty toolUse stop billed at 91k against thresholdTokens=76384 in an active goal, then asserts compaction:start:threshold never fires and the orphan is no longer in the session branch. Verified the test fails against the pre-fix commit and passes against 00d14accfb.

Verification:

bun test test/agent-session-auto-compaction-queue.test.ts test/shake.test.ts test/agent-session-compaction.test.ts test/agent-session-eager-compaction.test.ts test/agent-session-context-promotion.test.ts test/compaction-lifecycle.test.ts test/compaction.test.ts test/compaction-prefer-current-model.test.ts test/compaction-serialization.test.ts test/compaction-hooks.test.ts test/compaction-thinking-model.test.ts test/agent-session-unexpected-stop-guard.test.ts test/agent-session-plan-reference-compaction.test.ts
# 86 pass, 13 skip, 0 fail

Reporter on #3174 still sees no auto-compaction with thresholdTokens
lowered to 32768 against a 70k+ visible context, and reports the
existing `Auto-compaction threshold decision` log never appears.
Either the goal turn never reaches `#checkCompaction`, or the log
fires but is filtered out of their view (winston is at debug level so
it writes to ~/.omp/logs/omp.<DATE>.log, not the TUI).

Add an `agent_end maintenance routing` debug log at every branch of
the `agent_end` handler — entered/no-message,
skip-post-turn-maintenance, successful-yield (active goal vs not),
empty-stop-handled, active-goal pre-empt (and whether it scheduled a
continuation), unexpected-stop-handled, and bottom checkCompaction —
together with stopReason, provider/model, content shape, goal
state, and `successfulYield`. Combined with the existing
`Auto-compaction threshold decision` log, the next no-start report
identifies the exact early-return branch and the inputs that fed
`shouldCompact`.

Refs #3174

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e70c71077e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2853 to +2859
if (compactionResult.deferredHandoff || compactionResult.continuationScheduled) {
maintenanceRoute("active-goal-pre-empt-continuation-scheduled", {
deferredHandoff: compactionResult.deferredHandoff,
continuationScheduled: compactionResult.continuationScheduled,
});
await emitAgentEndNotification();
return;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve retry before active-goal compaction returns

When an active /goal retry succeeds with a text stop that is already over the compaction threshold, this new pre-emption path can return before the later #resolveRetry() call. The retry promise created by #handleRetryableError then stays pending; for example with deferred handoff or compaction.autoContinue=false, the compaction task may finish without any later agent_end to clear it, leaving prompt()/waitForIdle() stuck with isRetrying still true. Resolve the retry before returning from this compaction-continuation branch.

Useful? React with 👍 / 👎.

Active-goal threshold compaction can pre-empt the normal post-turn tail
and return once it schedules a deferred handoff or auto-continue. When
that turn is the successful response from an auto-retry, returning there
skips the later retry-gate cleanup and leaves isRetrying stuck.

Resolve the completed retry gate before the compaction-continuation
return, and cover the retry-success-over-threshold path so future
changes cannot strand prompt()/waitForIdle() behind a stale retry state.

Refs #3174
@roboomp

roboomp commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author
  • Resolved the completed retry gate before the active-goal compaction-continuation return, so isRetrying cannot stay stuck after a retry succeeds over the threshold.
  • Added a regression covering retry success → active-goal threshold compaction → auto-continue, asserting compaction:start:threshold fires and session.isRetrying clears.

roboomp added 2 commits June 22, 2026 20:02
Active goal loops can stay inside one agent run while the model keeps
emitting tool calls, so the normal agent_end threshold maintenance never
runs. That lets context grow past the soft threshold until provider
overflow or user abort.

Run threshold maintenance from the per-turn onTurnEnd hook for active
goals, splice the compacted agent state back into the live loop message
array, and suppress queued continuations because the current run is
already continuing. Cover the mid-run tool-call path and the non-goal
control case.

Refs #3174

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29e69f5364

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if (!shouldCompact(contextTokens, contextWindow, compactionSettings)) return;

const messagesBefore = activeMessages.length;
await this.#runAutoCompaction("threshold", false, false, false, {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist the just-finished turn before mid-run compaction

When an active /goal crosses the threshold between tool turns while a message_end extension hook is still running, this call can compact before the just-finished assistant/tool-result messages have been appended to sessionManager: Agent.#emit does not await #processAgentEvent, and persistence happens after #emitSessionEvent. #runAutoCompaction then prepares from sessionManager.getBranch(), so the summary can omit the tool result that activeMessages already contains; after compaction, the splice below replaces the live loop context with that stale compacted state, and the next model call can lose the tool output it was supposed to see. Ensure the current turn is persisted or compact from the active message array before running this mid-run path.

Useful? React with 👍 / 👎.

@can1357 can1357 merged commit 44b6241 into main Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

vouched Passed the vouch gate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/goal auto-compaction still does not trigger after #3147 with strategy: context-full

3 participants