fix(agent): size snapcompact maxFrames by the live model window#3249
fix(agent): size snapcompact maxFrames by the live model window#3249roboomp wants to merge 4 commits into
Conversation
Snapcompact's bundled MAX_FRAMES_DEFAULT (80) × FRAME_TOKEN_ESTIMATE (5024) ≈ 402k tokens worth of frames. AgentSession was calling snapcompact.compact() with no maxFrames override, so the post-render projection inside #runAuto Compaction / compact() always overflowed the budget on any sub-1M-token window (Claude Sonnet 4.5's 200k = 170k usable, the 80-frame projection alone clears that 2.4×), looping the 'snapcompact could not bring the context under the limit — using an LLM summary instead' warning on every threshold tick. AgentSession.#computeSnapcompactMaxFrames now sizes the frame cap from the resolved budget — (window − reserve − non-message − kept-recent − summary-text reserve) / FRAME_TOKEN_ESTIMATE, clamped to MAX_FRAMES_DEFAULT — and threads it into snapcompact.compact() in both the auto-compaction and manual /compact paths. When the kept-recent slice already exceeds the budget, snapcompact is skipped outright instead of running just to be rejected: the projection guard remains as a defensive check. Fixes #3247
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5cce507582
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // past the projection check below. | ||
| const SUMMARY_TEXT_RESERVE = 4000; | ||
| const frameBudget = ctxWindow - reserve - nonFrameTokens - SUMMARY_TEXT_RESERVE; | ||
| if (frameBudget < snapcompact.FRAME_TOKEN_ESTIMATE) return 0; |
There was a problem hiding this comment.
Allow zero-frame snapcompact archives when they fit
When the kept-recent slice leaves less than one FRAME_TOKEN_ESTIMATE of headroom but still enough room for a small text archive, this returns 0 before calling snapcompact. snapcompact.compact() can produce a valid text-only archive (frames: []) when the discarded history fits in its text edge regions, so /compact snapcompact and auto snapcompact now unnecessarily fall back to the LLM summarizer in that near-full-context case, which can fail for offline/no-compaction-credential sessions even though local snapcompact would have fit and the existing projection guard would have accepted it.
Useful? React with 👍 / 👎.
chatgpt-codex review on #3249: the helper returned 0 when frameBudget < FRAME_TOKEN_ESTIMATE, causing the caller to skip snapcompact entirely. But snapcompact.planArchive has a 'text.length <= 2 * edgeCap' short- circuit that produces a valid frames:[] archive when the discarded history is small enough — and the projection charges 0 for that. Hard return-0 blocked that opportunity, forcing the LLM summarizer fallback in offline/no-credential sessions where the text-only path would have landed cleanly. #computeSnapcompactMaxFrames now distinguishes two near-full cases: - frameBudget < 0 → return 0 (kept-recent already exhausted budget; no text-only summary can fit either) → caller still skips outright. - 0 ≤ frameBudget < FRAME_TOKEN_ESTIMATE → return 1 → snapcompact runs and picks the frame-less planArchive branch automatically for small discarded histories; the projection guard rejects any actual frame-bearing archive that overflows. Added regression test pinning maxFrames=1 (not 0) in the near-full window case.
|
Addressed in 65f945f.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 65f945f1b7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const SUMMARY_TEXT_RESERVE = 4000; | ||
| const frameBudget = ctxWindow - reserve - nonFrameTokens - SUMMARY_TEXT_RESERVE; | ||
| if (frameBudget < 0) return 0; |
There was a problem hiding this comment.
Allow text-only snapcompact below the fixed reserve
When kept-recent plus non-message tokens still leaves some real headroom but less than this fixed 4k reserve, frameBudget becomes negative and the callers skip snapcompact entirely. Fresh evidence in this revision is the new SUMMARY_TEXT_RESERVE subtraction before the return 0: a small text-only snapcompact archive can project under the window with far less than 4000 tokens, but this path now forces the LLM summarizer/offline failure instead of attempting the local zero-frame result.
Useful? React with 👍 / 👎.
chatgpt-codex second-pass review on #3249: the previous helper folded the 4k SUMMARY_TEXT_RESERVE into both the maxFrames cap math AND the skip decision (return 0 when frameBudget < 0). That made any residual headroom below 4k fall negative and force the LLM-summarizer fallback, even though a text-only snapcompact archive (the 'text.length <= 2 * edgeCap' short-circuit in planArchive) typically costs only a few hundred tokens of summary lead-in and would have fit cleanly. The two reserves now serve their own jobs: - Skip iff 'baseTokens >= totalBudget' (kept-recent + non-message already eats the entire window − reserve envelope). No reserve fudge here; positive residual is always worth attempting. - Cap reserve (4k) is applied ONLY to the maxFrames calculation so the projection still passes once frames land. When the frame budget goes negative under that reserve but residual headroom is positive, the helper now returns maxFrames=1 instead of 0 so snapcompact's frame-less planArchive branch can still produce a valid archive. Updated regression test to pin the new contract directly: kept-recent tuned for 1500 tokens of headroom (well below the 4k cap reserve), the old helper returned 0 and skipped to the LLM summarizer, the new helper invokes snapcompact with maxFrames=1.
|
Addressed in db57efc.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: db57efc3d3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const SUMMARY_TEXT_RESERVE = 4000; | ||
| const frameBudget = totalBudget - baseTokens - SUMMARY_TEXT_RESERVE; | ||
| if (frameBudget < snapcompact.FRAME_TOKEN_ESTIMATE) return 1; | ||
| return Math.min(Math.floor(frameBudget / snapcompact.FRAME_TOKEN_ESTIMATE), snapcompact.MAX_FRAMES_DEFAULT); |
There was a problem hiding this comment.
Reserve actual text-edge cost before capping frames
When the residual budget is large enough to permit frames, this fixed 4k reserve can still choose a maxFrames value that the projection immediately rejects. For example, with ~120k tokens of headroom on the default Anthropic/Sonnet shape, this returns 23 frames, but any archive large enough to use frames also carries both textHead and textTail; those two edge regions are about 27.8k bytes, which estimateTokens counts as roughly 7k tokens before the snapcompact summary/delimiters. The resulting projection is base + 23 * 5024 + >7k, which exceeds the same headroom and falls back to the LLM summary/repeats the warning loop this cap is intended to prevent. Size the cap using the actual summary/text-edge estimate, or decrement until #projectSnapcompactContextTokens would fit.
Useful? React with 👍 / 👎.
…cost chatgpt-codex third-pass review on #3249: the 4k SUMMARY_TEXT_RESERVE in the cap math undersized the actual textHead+textTail cost a frame- bearing archive carries (the projection separately bills 'countTokens(summary + textHead + textTail)'). At ~120k headroom on Anthropic 11on16-bw, the cap picked maxFrames=23, but '23 * 5024 + 2 * 13916 chars (≈7k tokens) + 2k summary template ≈ 124.5k' still exceeded the same 120k headroom — the cap chose a value the projection then immediately rejected, re-opening the warning loop. #computeSnapcompactMaxFrames now resolves the live snapcompact shape (same call the auto/manual paths pass to snapcompact.compact) and sizes the cap reserve from 'geometry(shape).capacity': textEdgeTokens = ceil(2 * capacity * 1.15 / 4) // 1.15 absorbs // tokenizer drift capReserve = textEdgeTokens + 2000 // + summary template For the default per-provider winners that resolves to ~10k (Anthropic Sonnet), ~14k (Opus 4.7), ~16k (Gemini 2.x), and ~10k (OpenAI) — all larger than the prior fixed 4k. Skip decision stays separate (baseTokens >= totalBudget), so positive sub-reserve headroom still runs snapcompact's text-only path. Test 1 retuned to baseline kept-recent ≈ 100k tokens with a strengthened assertion verifying the FULL projection invariant (frames + worst-case text edges + summary template + base ≤ budget). Confirmed test fails against the previous 4k-reserve helper by exactly the reviewer's predicted margin (174,271 vs 170,000 budget = 4,271 token overshoot).
|
Addressed in 2329944.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 232994496d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
| }); | ||
|
|
||
| await expect(session.compact(undefined, { mode: "snapcompact" })).rejects.toThrow(); |
There was a problem hiding this comment.
Stub the LLM fallback in the skip test
When this assertion exercises the maxFrames < 1 path, AgentSession.compact() does not stop after skipping snapcompact; it falls through to the normal LLM summarizer. Because the test installs a fake Anthropic runtime key in beforeEach and only spies on snapcompact.compact, this line can drive real provider compaction attempts with test-key before eventually rejecting, making the suite depend on network/auth behavior and potentially slow or flaky in CI. Stub the agent-core compaction path/model resolver here, or assert the skip through a seam that does not enter provider code.
Useful? React with 👍 / 👎.
Repro
Long-running sessions on any sub-1M-token model (Claude Sonnet 4.5 / Haiku 4.x, GPT-5.x, Gemini 2.x) loop the warning
on every threshold tick. The numbers reproduce deterministically with the bundled catalog:
Cause
AgentSession#compactandAgentSession.#runAutoCompactioninvokedsnapcompact.compact(preparation, { convertToLlm, model, shape })with nomaxFramesoverride, so snapcompact fell through toMAX_FRAMES_DEFAULT = 80. The post-render projection inAgentSession.#projectSnapcompactContextTokenschargesFRAME_TOKEN_ESTIMATE = 5024per frame block (the conservative high-res Anthropic ceiling — seepackages/agent/src/compaction/compaction.ts:362-368), making 80 × 5024 ≈ 402k frame-token projections.For a Claude Sonnet–class 200k window the usable budget is
200000 − max(15%, reserve) = 170000, so the projection rejects the snapcompact result before it ever lands, downgrades toaction = "context-full", and re-emits the warning every threshold trigger.MAX_FRAMES_DEFAULT = 80was sized for Opus 4.7's 1M window (packages/snapcompact/src/snapcompact.ts:387-394) but no per-caller cap was wired in to scale it down for smaller windows.Fix
AgentSession.#computeSnapcompactMaxFrames(preparation, settings)which sizes the frame cap from the live(contextWindow − effectiveReserveTokens − computeNonMessageTokens − Σ estimateTokens(recentMessages) − 4000 summary-text reserve) / snapcompact.FRAME_TOKEN_ESTIMATE, clamped to[0, MAX_FRAMES_DEFAULT]. Uses the same per-frame charge the projection uses, so pickingmaxFramesfrom this helper makes the projection succeed by construction.#runAutoCompaction, near the prioraction === "snapcompact"block) and the manual/compactpath (compact(), near the priorsnapcompactReadyblock).snapcompact.compact(preparation, { … , maxFrames })is now called with the budget-sized cap.maxFrames < 1(kept-recent alone exhausts the window) snapcompact is skipped entirely instead of running just to be rejected, and the user-facing notice now reads "snapcompact: kept history alone exceeds the context budget — using an LLM summary instead" instead of the misleading "could not bring the context under the limit" (which implied snapcompact had actually produced something). The downstream projection check is kept as a defensive guard for summary-text drift.Single fixed-cost commit; no caller signatures changed.
Verification
bun --cwd packages/coding-agent check(Biome + tsgo) — passes.bun --cwd packages/coding-agent test test/compaction.test.ts test/compaction-lifecycle.test.ts test/compaction-hooks.test.ts test/compaction-serialization.test.ts test/agent-session-auto-compaction-queue.test.ts test/agent-session-eager-compaction.test.ts test/snapcompact-inline.test.ts test/agent-session-snapcompact-budget.test.ts— 77 pass, 9 skip, 0 fail across 8 files.packages/coding-agent/test/agent-session-snapcompact-budget.test.ts(2 cases):snapcompact.compact()maxFramesis< MAX_FRAMES_DEFAULT,> 0, andmaxFrames × FRAME_TOKEN_ESTIMATE < (window − reserve);snapcompact.compact()is never invoked and the "kept history alone exceeds the context budget" notice fires instead.Both fail without this diff (confirmed by stashing
src/session/agent-session.ts) and pass with it.Fixes #3247