Skip to content

fix(agent): size snapcompact maxFrames by the live model window#3249

Open
roboomp wants to merge 4 commits into
mainfrom
farm/6396f3c3/fix-snapcompact-frame-budget-overflow
Open

fix(agent): size snapcompact maxFrames by the live model window#3249
roboomp wants to merge 4 commits into
mainfrom
farm/6396f3c3/fix-snapcompact-frame-budget-overflow

Conversation

@roboomp

@roboomp roboomp commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Repro

Long-running sessions on any sub-1M-token model (Claude Sonnet 4.5 / Haiku 4.x, GPT-5.x, Gemini 2.x) loop the warning

Warning: compaction: snapcompact could not bring the context under the limit — using an LLM summary instead
Auto context-full maintenance cancelled

on every threshold tick. The numbers reproduce deterministically with the bundled catalog:

import * as snapcompact from "@oh-my-pi/snapcompact";
const sonnet = { id: "claude-sonnet-4-5", api: "anthropic-messages", input: ["text","image"] };
const shape = snapcompact.resolveShape(sonnet);
const huge = "foo bar baz ".repeat(Math.ceil(80 * snapcompact.geometry(shape).capacity / 12) + 1000);
const planned = Math.min(snapcompact.frames(huge, { shape }), snapcompact.MAX_FRAMES_DEFAULT);
const projected = 50_000 + planned * snapcompact.FRAME_TOKEN_ESTIMATE;
const budget = 200_000 - Math.max(0.15 * 200_000, 16384);
console.log({ planned, projected, budget, rejected: projected > budget });
// { planned: 80, projected: 451920, budget: 170000, rejected: true }

Cause

AgentSession#compact and AgentSession.#runAutoCompaction invoked snapcompact.compact(preparation, { convertToLlm, model, shape }) with no maxFrames override, so snapcompact fell through to MAX_FRAMES_DEFAULT = 80. The post-render projection in AgentSession.#projectSnapcompactContextTokens charges FRAME_TOKEN_ESTIMATE = 5024 per frame block (the conservative high-res Anthropic ceiling — see packages/agent/src/compaction/compaction.ts:362-368), making 80 × 5024 ≈ 402k frame-token projections.

For a Claude Sonnet–class 200k window the usable budget is 200000 − max(15%, reserve) = 170000, so the projection rejects the snapcompact result before it ever lands, downgrades to action = "context-full", and re-emits the warning every threshold trigger. MAX_FRAMES_DEFAULT = 80 was sized for Opus 4.7's 1M window (packages/snapcompact/src/snapcompact.ts:387-394) but no per-caller cap was wired in to scale it down for smaller windows.

Fix

  • Added AgentSession.#computeSnapcompactMaxFrames(preparation, settings) which sizes the frame cap from the live (contextWindow − effectiveReserveTokens − computeNonMessageTokens − Σ estimateTokens(recentMessages) − 4000 summary-text reserve) / snapcompact.FRAME_TOKEN_ESTIMATE, clamped to [0, MAX_FRAMES_DEFAULT]. Uses the same per-frame charge the projection uses, so picking maxFrames from this helper makes the projection succeed by construction.
  • Threaded the helper into the auto-compaction path (#runAutoCompaction, near the prior action === "snapcompact" block) and the manual /compact path (compact(), near the prior snapcompactReady block). snapcompact.compact(preparation, { … , maxFrames }) is now called with the budget-sized cap.
  • When maxFrames < 1 (kept-recent alone exhausts the window) snapcompact is skipped entirely instead of running just to be rejected, and the user-facing notice now reads "snapcompact: kept history alone exceeds the context budget — using an LLM summary instead" instead of the misleading "could not bring the context under the limit" (which implied snapcompact had actually produced something). The downstream projection check is kept as a defensive guard for summary-text drift.

Single fixed-cost commit; no caller signatures changed.

Verification

  • bun --cwd packages/coding-agent check (Biome + tsgo) — passes.
  • bun --cwd packages/coding-agent test test/compaction.test.ts test/compaction-lifecycle.test.ts test/compaction-hooks.test.ts test/compaction-serialization.test.ts test/agent-session-auto-compaction-queue.test.ts test/agent-session-eager-compaction.test.ts test/snapcompact-inline.test.ts test/agent-session-snapcompact-budget.test.ts77 pass, 9 skip, 0 fail across 8 files.
  • New regression packages/coding-agent/test/agent-session-snapcompact-budget.test.ts (2 cases):
    • asserts that for a 200k-window Sonnet session the captured snapcompact.compact() maxFrames is < MAX_FRAMES_DEFAULT, > 0, and maxFrames × FRAME_TOKEN_ESTIMATE < (window − reserve);
    • asserts that with one synthetic kept-recent message larger than the model window, snapcompact.compact() is never invoked and the "kept history alone exceeds the context budget" notice fires instead.
      Both fail without this diff (confirmed by stashing src/session/agent-session.ts) and pass with it.

Fixes #3247

Snapcompact's bundled MAX_FRAMES_DEFAULT (80) × FRAME_TOKEN_ESTIMATE (5024)
≈ 402k tokens worth of frames. AgentSession was calling snapcompact.compact()
with no maxFrames override, so the post-render projection inside #runAuto
Compaction / compact() always overflowed the budget on any sub-1M-token
window (Claude Sonnet 4.5's 200k = 170k usable, the 80-frame projection
alone clears that 2.4×), looping the 'snapcompact could not bring the
context under the limit — using an LLM summary instead' warning on every
threshold tick.

AgentSession.#computeSnapcompactMaxFrames now sizes the frame cap from
the resolved budget — (window − reserve − non-message − kept-recent −
summary-text reserve) / FRAME_TOKEN_ESTIMATE, clamped to MAX_FRAMES_DEFAULT
— and threads it into snapcompact.compact() in both the auto-compaction
and manual /compact paths. When the kept-recent slice already exceeds the
budget, snapcompact is skipped outright instead of running just to be
rejected: the projection guard remains as a defensive check.

Fixes #3247
@github-actions github-actions Bot added the vouched Passed the vouch gate label Jun 22, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5cce507582

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// past the projection check below.
const SUMMARY_TEXT_RESERVE = 4000;
const frameBudget = ctxWindow - reserve - nonFrameTokens - SUMMARY_TEXT_RESERVE;
if (frameBudget < snapcompact.FRAME_TOKEN_ESTIMATE) return 0;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow zero-frame snapcompact archives when they fit

When the kept-recent slice leaves less than one FRAME_TOKEN_ESTIMATE of headroom but still enough room for a small text archive, this returns 0 before calling snapcompact. snapcompact.compact() can produce a valid text-only archive (frames: []) when the discarded history fits in its text edge regions, so /compact snapcompact and auto snapcompact now unnecessarily fall back to the LLM summarizer in that near-full-context case, which can fail for offline/no-compaction-credential sessions even though local snapcompact would have fit and the existing projection guard would have accepted it.

Useful? React with 👍 / 👎.

chatgpt-codex review on #3249: the helper returned 0 when frameBudget
< FRAME_TOKEN_ESTIMATE, causing the caller to skip snapcompact entirely.
But snapcompact.planArchive has a 'text.length <= 2 * edgeCap' short-
circuit that produces a valid frames:[] archive when the discarded
history is small enough — and the projection charges 0 for that. Hard
return-0 blocked that opportunity, forcing the LLM summarizer fallback
in offline/no-credential sessions where the text-only path would have
landed cleanly.

#computeSnapcompactMaxFrames now distinguishes two near-full cases:
  - frameBudget < 0 → return 0 (kept-recent already exhausted budget;
    no text-only summary can fit either) → caller still skips outright.
  - 0 ≤ frameBudget < FRAME_TOKEN_ESTIMATE → return 1 → snapcompact runs
    and picks the frame-less planArchive branch automatically for small
    discarded histories; the projection guard rejects any actual
    frame-bearing archive that overflows.

Added regression test pinning maxFrames=1 (not 0) in the near-full
window case.
@roboomp

roboomp commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed in 65f945f.

  • #computeSnapcompactMaxFrames now distinguishes frameBudget < 0 (truly impossible — kept-recent + non-message + summary reserve > budget → return 0 → caller skips) from 0 ≤ frameBudget < FRAME_TOKEN_ESTIMATE (frame won't fit but text-only might → return 1 → snapcompact runs and planArchive's text.length <= 2 * edgeCap short-circuit can still produce a valid frames: [] archive that the projection accepts).
  • Caller skip path is unchanged, just triggers less often. Projection guard still rejects any actual frame-bearing archive that overflows, so offline sessions can land snapcompact text-only instead of being forced into the LLM summarizer.
  • Regression test still invokes snapcompact with maxFrames=1 when the budget can only fit a text-only archive pins the new contract (kept-recent tuned so frameBudget lands in [0, FRAME_TOKEN_ESTIMATE), snapcompact still invoked with maxFrames = 1).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 65f945f1b7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +9461 to +9463
const SUMMARY_TEXT_RESERVE = 4000;
const frameBudget = ctxWindow - reserve - nonFrameTokens - SUMMARY_TEXT_RESERVE;
if (frameBudget < 0) return 0;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow text-only snapcompact below the fixed reserve

When kept-recent plus non-message tokens still leaves some real headroom but less than this fixed 4k reserve, frameBudget becomes negative and the callers skip snapcompact entirely. Fresh evidence in this revision is the new SUMMARY_TEXT_RESERVE subtraction before the return 0: a small text-only snapcompact archive can project under the window with far less than 4000 tokens, but this path now forces the LLM summarizer/offline failure instead of attempting the local zero-frame result.

Useful? React with 👍 / 👎.

chatgpt-codex second-pass review on #3249: the previous helper folded
the 4k SUMMARY_TEXT_RESERVE into both the maxFrames cap math AND the
skip decision (return 0 when frameBudget < 0). That made any residual
headroom below 4k fall negative and force the LLM-summarizer fallback,
even though a text-only snapcompact archive (the 'text.length <= 2 *
edgeCap' short-circuit in planArchive) typically costs only a few
hundred tokens of summary lead-in and would have fit cleanly.

The two reserves now serve their own jobs:

- Skip iff 'baseTokens >= totalBudget' (kept-recent + non-message
  already eats the entire window − reserve envelope). No reserve
  fudge here; positive residual is always worth attempting.
- Cap reserve (4k) is applied ONLY to the maxFrames calculation so
  the projection still passes once frames land. When the frame budget
  goes negative under that reserve but residual headroom is positive,
  the helper now returns maxFrames=1 instead of 0 so snapcompact's
  frame-less planArchive branch can still produce a valid archive.

Updated regression test to pin the new contract directly: kept-recent
tuned for 1500 tokens of headroom (well below the 4k cap reserve), the
old helper returned 0 and skipped to the LLM summarizer, the new helper
invokes snapcompact with maxFrames=1.
@roboomp

roboomp commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed in db57efc.

  • Skip decision now uses raw baseTokens >= totalBudget (no reserve fudge): snapcompact is skipped only when kept-recent + non-message already exhausts the ctxWindow − reserve envelope.
  • 4k SUMMARY_TEXT_RESERVE is applied only to the maxFrames cap calculation, never to the skip — so residual headroom of 1–4k tokens now lands snapcompact (returning maxFrames=1) instead of forcing the LLM summarizer.
  • Regression test retuned to headroomTokens = 1500 (the exact sub-reserve case the review flags) so the test fails against the previous helper revision (confirmed: prior code falls through to the real LLM call and 404s) and passes against the split-reserve fix.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: db57efc3d3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +9473 to +9476
const SUMMARY_TEXT_RESERVE = 4000;
const frameBudget = totalBudget - baseTokens - SUMMARY_TEXT_RESERVE;
if (frameBudget < snapcompact.FRAME_TOKEN_ESTIMATE) return 1;
return Math.min(Math.floor(frameBudget / snapcompact.FRAME_TOKEN_ESTIMATE), snapcompact.MAX_FRAMES_DEFAULT);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reserve actual text-edge cost before capping frames

When the residual budget is large enough to permit frames, this fixed 4k reserve can still choose a maxFrames value that the projection immediately rejects. For example, with ~120k tokens of headroom on the default Anthropic/Sonnet shape, this returns 23 frames, but any archive large enough to use frames also carries both textHead and textTail; those two edge regions are about 27.8k bytes, which estimateTokens counts as roughly 7k tokens before the snapcompact summary/delimiters. The resulting projection is base + 23 * 5024 + >7k, which exceeds the same headroom and falls back to the LLM summary/repeats the warning loop this cap is intended to prevent. Size the cap using the actual summary/text-edge estimate, or decrement until #projectSnapcompactContextTokens would fit.

Useful? React with 👍 / 👎.

…cost

chatgpt-codex third-pass review on #3249: the 4k SUMMARY_TEXT_RESERVE
in the cap math undersized the actual textHead+textTail cost a frame-
bearing archive carries (the projection separately bills
'countTokens(summary + textHead + textTail)'). At ~120k headroom on
Anthropic 11on16-bw, the cap picked maxFrames=23, but
'23 * 5024 + 2 * 13916 chars (≈7k tokens) + 2k summary template ≈ 124.5k'
still exceeded the same 120k headroom — the cap chose a value the
projection then immediately rejected, re-opening the warning loop.

#computeSnapcompactMaxFrames now resolves the live snapcompact shape
(same call the auto/manual paths pass to snapcompact.compact) and sizes
the cap reserve from 'geometry(shape).capacity':

  textEdgeTokens = ceil(2 * capacity * 1.15 / 4)   // 1.15 absorbs
                                                   // tokenizer drift
  capReserve     = textEdgeTokens + 2000           // + summary template

For the default per-provider winners that resolves to ~10k (Anthropic
Sonnet), ~14k (Opus 4.7), ~16k (Gemini 2.x), and ~10k (OpenAI) — all
larger than the prior fixed 4k. Skip decision stays separate
(baseTokens >= totalBudget), so positive sub-reserve headroom still
runs snapcompact's text-only path.

Test 1 retuned to baseline kept-recent ≈ 100k tokens with a strengthened
assertion verifying the FULL projection invariant (frames + worst-case
text edges + summary template + base ≤ budget). Confirmed test fails
against the previous 4k-reserve helper by exactly the reviewer's
predicted margin (174,271 vs 170,000 budget = 4,271 token overshoot).
@roboomp

roboomp commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed in 2329944.

  • Cap reserve is now sized from the live snapcompact shape's geometry(shape).capacitytextEdgeTokens = ⌈2 × capacity × 1.15 / 4⌉ plus a 2k summary-template allowance — mirroring what #projectSnapcompactContextTokens will actually charge once frames land (≈10k Anthropic Sonnet / OpenAI, ≈14k Opus 4.7, ≈16k Gemini, all up from the prior 4k).
  • Resolved via the same snapcompact.resolveShape(model, settings) the auto/manual paths pass to snapcompact.compact(), so the cap and the projection agree on the shape by construction.
  • Skip decision stays decoupled (baseTokens >= totalBudget), so positive sub-reserve headroom still runs the text-only path.
  • Regression test 1 retuned: kept-recent ≈ 100k tokens, assertion strengthened to verify the FULL projection (base + frames × FRAME_TOKEN_ESTIMATE + worst-case text edges + summary template ≤ budget). Confirmed failing against the prior 4k helper by exactly the reviewer's predicted margin (174,271 vs 170,000 budget = 4,271-token overshoot).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 232994496d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

}
});

await expect(session.compact(undefined, { mode: "snapcompact" })).rejects.toThrow();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stub the LLM fallback in the skip test

When this assertion exercises the maxFrames < 1 path, AgentSession.compact() does not stop after skipping snapcompact; it falls through to the normal LLM summarizer. Because the test installs a fake Anthropic runtime key in beforeEach and only spies on snapcompact.compact, this line can drive real provider compaction attempts with test-key before eventually rejecting, making the suite depend on network/auth behavior and potentially slow or flaky in CI. Stub the agent-core compaction path/model resolver here, or assert the skip through a seam that does not enter provider code.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

vouched Passed the vouch gate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] auto snapcompact long time run fail

1 participant