fix(compaction): trigger /goal threshold on billed context, not post-prune estimate by roboomp · Pull Request #3175 · can1357/oh-my-pi

roboomp · 2026-06-21T07:23:33Z

Repro

Goal mode session with the reporter's config:

compaction:
  enabled: true
  strategy: context-full
  thresholdTokens: 76384
  supersedeReads: true
  dropUseless: true

Long-running /goal session grew past 90k tokens of visible context. The
status-line context (anchored to the last assistant's billed prompt)
showed >compaction.thresholdTokens, but auto-compaction never fired.
Manual /compact still worked. The reporter also verified that forcing
#checkCompaction(...) on the successful-yield branch did not change the
outcome, so the early-return fix from #3147 was not the only path
involved.

Cause

AgentSession.#checkCompaction ran the per-turn supersede +
drop-useless prune passes, then subtracted their tokensSaved from
calculateContextTokens(assistantMessage.usage) before calling
shouldCompact. Pruning frees bytes for the next prompt; it does
not change the size of the prompt the LLM just billed for. When per-turn
pruning saved ≥ billedContext - thresholdTokens tokens (trivially
reachable for a goal loop that keeps feeding the agent large tool
outputs), the subtraction dropped the threshold input below
compaction.thresholdTokens even while the user-visible context
(anchored to the same provider billing) sat well above it. The trigger
silently no-op'd and the loop kept running.

Fix

packages/coding-agent/src/session/agent-session.ts:#checkCompaction
now feeds shouldCompact the actual last-turn billed context
(calculateContextTokens(assistantMessage.usage)), floored by the
post-prune #estimateStoredContextTokens() so a payload-compression
hook (Headroom / inline snapcompact / obfuscator) still cannot deflate
the trigger. The supersede + age-prune passes still run for their
side-effects (mutating the next prompt's tool-result tail), their
tokensSaved is just no longer subtracted from the threshold input.

Verification

packages/coding-agent/test/agent-session-auto-compaction-queue.test.ts
seeds one large useless: true tool result whose suffix sits inside the
8k cache-warm window so #pruneStaleToolResults returns ≥20k savings,
then asserts compaction fires when the final turn bills 91k tokens
against the reporter's thresholdTokens: 76384. The new test fails
against the pre-fix code (no compaction:start:threshold signal) and
passes with the patch.

bun --cwd packages/coding-agent test test/agent-session-auto-compaction-queue.test.ts
# 5 pass, 0 fail
bun --cwd packages/coding-agent test \
  test/agent-session-auto-compaction-queue.test.ts \
  test/agent-session-compaction.test.ts \
  test/agent-session-eager-compaction.test.ts \
  test/agent-session-context-promotion.test.ts \
  test/agent-session-plan-reference-compaction.test.ts \
  test/issue-986-compaction-auth-fallback.test.ts \
  test/compaction-lifecycle.test.ts \
  test/compaction.test.ts \
  test/compaction-prefer-current-model.test.ts \
  test/compaction-serialization.test.ts \
  test/compaction-hooks.test.ts \
  test/compaction-thinking-model.test.ts \
  test/input-controller-compaction-image.test.ts \
  test/append-only-context-mode.test.ts \
  test/context-consolidation.test.ts
# 102 pass, 13 skip, 0 fail
bun --cwd packages/agent test \
  test/supersede-prune.test.ts \
  test/normalize-tools-prune.test.ts \
  test/compaction-telemetry.test.ts \
  test/compaction-thinking-level.test.ts \
  test/compaction-file-ops.test.ts
# 65 pass, 0 fail

Fixes #3174

…ost-prune estimate Pruning frees bytes for the NEXT prompt — it does not change the size of the prompt the LLM just billed for. Subtracting the per-turn `#pruneStaleToolResults` / `#pruneToolOutputs` savings from the threshold input let a long-running `/goal` session sit above `compaction.thresholdTokens` indefinitely: the visible context (anchored to the same provider billing) showed >threshold, but `shouldCompact` no-op'd because the subtraction dropped the input below the trigger. The `compactionContextTokens` floor against the post-prune local estimate is still applied, so a payload-compression hook still can't deflate the trigger. Regression test seeds one large `useless` tool result whose suffix sits inside the 8k cache-warm window so `#pruneStaleToolResults` actually returns ≥20k savings, then asserts compaction fires when the final turn bills 91k tokens against the reporter's `thresholdTokens: 76384`. Fixes #3174

chatgpt-codex-connector

💡 Codex Review

oh-my-pi/packages/coding-agent/src/session/agent-session.ts

Line 8355 in b74f7bb

triggerContextTokens: contextTokens,

Preserve prune savings for shake fallback

When compaction.strategy === "shake" and the prune passes above reclaim a large useless/superseded tail, this now passes the pre-prune billed total into #runAutoShake as triggerContextTokens. That fallback check subtracts only shake's own tokensFreed, so it can conclude the session is still over the recovery band and fall through to context-full LLM compaction even when the post-prune history is already under the threshold. Keep using billed tokens for the initial trigger, but pass a post-prune-adjusted value (or include the prune savings) for the shake post-maintenance check.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

luceat-lux-vestra · 2026-06-21T08:26:57Z

@roboomp
I built and tested the farm/fc89a4e3/goal-auto-compaction-still-not-triggering branch, including commit b74f7bb, but the issue still reproduces in my real /goal session.

The session still goes beyond the configured threshold:

compaction:
  enabled: true
  strategy: context-full
  thresholdTokens: 76384
  thresholdPercent: -1
  keepRecentTokens: 10000
  reserveTokens: 16384
  autoContinue: true
  supersedeReads: true
  dropUseless: true
  idleEnabled: false
  remoteEnabled: false

Visible context/token usage is still above 90k, but automatic compaction does not appear to run.

So the post-prune token subtraction fixed in #3175 does not seem to be the only remaining issue, at least in my real /goal run.

I think we need more runtime diagnostics around the actual decision path. In particular, it would help to know whether the real session is reaching:

the successful-yield branch,
#checkCompaction(...),
shouldCompact(...),
#tryContextPromotion(...),
#runAutoCompaction(...),
prepareCompaction(...).

A useful debug log/event would be something like:

auto-compaction decision:
  phase=post-agent-end
  goalMode.enabled=...
  goal.status=...
  successfulYield=...
  stopReason=...
  sameModel=...
  skipPostTurnMaintenance=...
  contextWindow=...
  thresholdTokens=...
  assistantUsageContextTokens=...
  storedContextTokens=...
  resolvedContextTokens=...
  shouldCompact=...
  contextPromotion.enabled=...
  promoted=...
  runAutoCompactionEntered=...
  prepareCompactionAvailable=...
  skipReason=...

Since #runAutoCompaction() emits auto_compaction_start immediately after installing the controller, if no auto_compaction_start is visible then the real problem is likely before #runAutoCompaction() — either the /goal path is not calling #checkCompaction(), the threshold decision is still false, context promotion is taking over, or some post-turn maintenance guard is returning early.

…tions Active goal turns that stopped with text could hit the empty/unexpected-stop continuation guards before threshold maintenance. When those guards scheduled another goal turn, #checkCompaction never ran, so no auto_compaction_start was emitted even while visible context stayed above thresholdTokens. Run threshold maintenance once before those active-goal self-continuations and log the threshold decision inputs: billed context, stored estimate, resolved trigger tokens, post-maintenance tokens, strategy, threshold, promotion state, and shouldCompact. Also pass post-prune maintenance tokens into the shake recovery-band check so supersede/drop-useless savings are preserved when deciding whether shake still needs to fall back to context-full compaction. Fixes #3174

roboomp · 2026-06-21T08:36:31Z

Updated #3175 with two follow-up fixes after your real-session repro and Codex's review.

What changed:

Active /goal text stops now run threshold maintenance before empty/unexpected-stop retry continuations can return from agent_end. I reproduced the old behavior with a test where features.unexpectedStopDetection classified the stop as unexpected: pre-fix there was no compaction:start:threshold; post-fix compaction starts first.
The threshold path now logs Auto-compaction threshold decision with strategy, stop reason, context window, configured threshold, billed context tokens, stored estimate, resolved trigger tokens, post-maintenance tokens, prune savings, shouldCompact, and context-promotion state.
Addressed Codex's shake review: shake fallback now receives post-prune maintenance tokens, so supersede/drop-useless savings are preserved when deciding whether to fall back to context-full.

Verification:

bun test test/agent-session-auto-compaction-queue.test.ts -t "unexpected-stop"
# 1 pass, 0 fail
bun test test/shake.test.ts -t "pre-shake prune"
# 1 pass, 0 fail
bun test test/agent-session-auto-compaction-queue.test.ts test/shake.test.ts test/agent-session-compaction.test.ts test/agent-session-eager-compaction.test.ts test/agent-session-context-promotion.test.ts test/compaction-lifecycle.test.ts test/compaction.test.ts test/compaction-prefer-current-model.test.ts test/compaction-serialization.test.ts test/compaction-hooks.test.ts test/compaction-thinking-model.test.ts test/append-only-context-mode.test.ts test/context-consolidation.test.ts
# 95 pass, 13 skip, 0 fail

The branch has been pushed again through the host gate.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d444b3c479

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-21T08:39:24Z

+				if (compactionResult.deferredHandoff || compactionResult.continuationScheduled) {
+					await emitAgentEndNotification();
+					return;


Keep empty tool-use cleanup before compaction continuations

When an active goal hits the compaction threshold and the assistant returned an empty toolUse stop, this early return skips #handleEmptyAssistantStop; that handler is the only path that removes the orphaned tool-use assistant from active context/session history before retrying. With the default compaction auto-continue path, the next turn can be scheduled with a toolUse assistant that has no tool call, which the existing cleanup comment notes corrupts Anthropic message history. Run the empty-stop cleanup before returning, or avoid the early return for empty tool-use stops.

Useful? React with 👍 / 👎.

…n continuation Codex review on #3175: the active-goal compaction pre-empt I added in 8ab754f short-circuited #handleEmptyAssistantStop. That handler is the only path that strips an orphan toolUse assistant (stopReason "toolUse" with no toolCall block) from both active context and the session branch via #removeEmptyStopFromActiveContext. With the pre-empt ordering, an over-threshold goal turn that returned an empty toolUse left the orphan as the session leaf, and the compaction auto-continue prompt fed it back into the next Anthropic turn as a tool_use with no matching tool_result — the exact history-corruption pattern the existing cleanup comment defends against. Move #handleEmptyAssistantStop back ahead of the active-goal compaction probe. Empty stops still self-retry and never reach the threshold pre-empt; non-empty stops (the reporter's failure in #3174) still hit threshold maintenance before the unexpected-stop classifier. Regression test seeds a goal-mode empty toolUse stop billed at 91k against thresholdTokens 76384 and asserts the threshold compaction never starts and the orphan is no longer in the session branch. Fixes #3174

roboomp · 2026-06-21T08:49:57Z

Addressed Codex's P2 in 00d14accfb.

What changed:

Moved #handleEmptyAssistantStop back ahead of the active-goal compaction probe in agent_end, so an empty toolUse stop still runs #removeEmptyStopFromActiveContext (strips the orphan from active context + session branch) before any compaction continuation. Non-empty stops still hit threshold maintenance before the unexpected-stop classifier — the reporter's failure path is unchanged.
Added a regression test that seeds an empty toolUse stop billed at 91k against thresholdTokens=76384 in an active goal, then asserts compaction:start:threshold never fires and the orphan is no longer in the session branch. Verified the test fails against the pre-fix commit and passes against 00d14accfb.

Verification:

bun test test/agent-session-auto-compaction-queue.test.ts test/shake.test.ts test/agent-session-compaction.test.ts test/agent-session-eager-compaction.test.ts test/agent-session-context-promotion.test.ts test/compaction-lifecycle.test.ts test/compaction.test.ts test/compaction-prefer-current-model.test.ts test/compaction-serialization.test.ts test/compaction-hooks.test.ts test/compaction-thinking-model.test.ts test/agent-session-unexpected-stop-guard.test.ts test/agent-session-plan-reference-compaction.test.ts
# 86 pass, 13 skip, 0 fail

Reporter on #3174 still sees no auto-compaction with thresholdTokens lowered to 32768 against a 70k+ visible context, and reports the existing `Auto-compaction threshold decision` log never appears. Either the goal turn never reaches `#checkCompaction`, or the log fires but is filtered out of their view (winston is at debug level so it writes to ~/.omp/logs/omp.<DATE>.log, not the TUI). Add an `agent_end maintenance routing` debug log at every branch of the `agent_end` handler — entered/no-message, skip-post-turn-maintenance, successful-yield (active goal vs not), empty-stop-handled, active-goal pre-empt (and whether it scheduled a continuation), unexpected-stop-handled, and bottom checkCompaction — together with stopReason, provider/model, content shape, goal state, and `successfulYield`. Combined with the existing `Auto-compaction threshold decision` log, the next no-start report identifies the exact early-return branch and the inputs that fed `shouldCompact`. Refs #3174

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e70c71077e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-21T08:58:44Z

+				if (compactionResult.deferredHandoff || compactionResult.continuationScheduled) {
+					maintenanceRoute("active-goal-pre-empt-continuation-scheduled", {
+						deferredHandoff: compactionResult.deferredHandoff,
+						continuationScheduled: compactionResult.continuationScheduled,
+					});
+					await emitAgentEndNotification();
+					return;


Resolve retry before active-goal compaction returns

When an active /goal retry succeeds with a text stop that is already over the compaction threshold, this new pre-emption path can return before the later #resolveRetry() call. The retry promise created by #handleRetryableError then stays pending; for example with deferred handoff or compaction.autoContinue=false, the compaction task may finish without any later agent_end to clear it, leaving prompt()/waitForIdle() stuck with isRetrying still true. Resolve the retry before returning from this compaction-continuation branch.

Useful? React with 👍 / 👎.

Active-goal threshold compaction can pre-empt the normal post-turn tail and return once it schedules a deferred handoff or auto-continue. When that turn is the successful response from an auto-retry, returning there skips the later retry-gate cleanup and leaves isRetrying stuck. Resolve the completed retry gate before the compaction-continuation return, and cover the retry-success-over-threshold path so future changes cannot strand prompt()/waitForIdle() behind a stale retry state. Refs #3174

roboomp · 2026-06-21T09:09:25Z

Resolved the completed retry gate before the active-goal compaction-continuation return, so isRetrying cannot stay stuck after a retry succeeds over the threshold.
Added a regression covering retry success → active-goal threshold compaction → auto-continue, asserting compaction:start:threshold fires and session.isRetrying clears.

Active goal loops can stay inside one agent run while the model keeps emitting tool calls, so the normal agent_end threshold maintenance never runs. That lets context grow past the soft threshold until provider overflow or user abort. Run threshold maintenance from the per-turn onTurnEnd hook for active goals, splice the compacted agent state back into the live loop message array, and suppress queued continuations because the current run is already continuing. Cover the mid-run tool-call path and the non-goal control case. Refs #3174

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29e69f5364

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-22T20:11:55Z

+		if (!shouldCompact(contextTokens, contextWindow, compactionSettings)) return;
+
+		const messagesBefore = activeMessages.length;
+		await this.#runAutoCompaction("threshold", false, false, false, {


Persist the just-finished turn before mid-run compaction

When an active /goal crosses the threshold between tool turns while a message_end extension hook is still running, this call can compact before the just-finished assistant/tool-result messages have been appended to sessionManager: Agent.#emit does not await #processAgentEvent, and persistence happens after #emitSessionEvent. #runAutoCompaction then prepares from sessionManager.getBranch(), so the summary can omit the tool result that activeMessages already contains; after compaction, the splice below replaces the live loop context with that stale compacted state, and the next model call can lose the tool output it was supposed to see. Ensure the current turn is persisted or compact from the active message array before running this mid-run path.

Useful? React with 👍 / 👎.

github-actions Bot added the vouched Passed the vouch gate label Jun 21, 2026

roboomp mentioned this pull request Jun 21, 2026

/goal auto-compaction still does not trigger after #3147 with strategy: context-full #3174

Closed

chatgpt-codex-connector Bot reviewed Jun 21, 2026

View reviewed changes

roboomp added 2 commits June 21, 2026 08:35

style: bun run fix

d444b3c

chatgpt-codex-connector Bot reviewed Jun 21, 2026

View reviewed changes

luceat-lux-vestra mentioned this pull request Jun 22, 2026

fix(compaction): run /goal threshold maintenance mid-run between tool turns #3278

Closed

roboomp added 2 commits June 22, 2026 20:02

style: bun run fix

29e69f5

chatgpt-codex-connector Bot reviewed Jun 22, 2026

View reviewed changes

can1357 merged commit 44b6241 into main Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(compaction): trigger /goal threshold on billed context, not post-prune estimate#3175

fix(compaction): trigger /goal threshold on billed context, not post-prune estimate#3175
can1357 merged 8 commits into
mainfrom
farm/fc89a4e3/goal-auto-compaction-still-not-triggering

roboomp commented Jun 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

luceat-lux-vestra commented Jun 21, 2026 •

edited

Loading

Uh oh!

roboomp commented Jun 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 21, 2026

Uh oh!

roboomp commented Jun 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 21, 2026

Uh oh!

roboomp commented Jun 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

roboomp commented Jun 21, 2026

Repro

Cause

Fix

Verification

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

luceat-lux-vestra commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roboomp commented Jun 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

roboomp commented Jun 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 21, 2026

Choose a reason for hiding this comment

Uh oh!

roboomp commented Jun 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

luceat-lux-vestra commented Jun 21, 2026 •

edited

Loading