Skip to content

fix(tui): reduce large transcript stalls#3259

Open
roboomp wants to merge 4 commits into
mainfrom
farm/89ae67a3/fix-large-session-tui-stalls
Open

fix(tui): reduce large transcript stalls#3259
roboomp wants to merge 4 commits into
mainfrom
farm/89ae67a3/fix-large-session-tui-stalls

Conversation

@roboomp

@roboomp roboomp commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Repro

Focused regression tests reproduce the large-session hot paths: local transcript append renders TAILMARKER only after a whole-file fs.readFileSync, live display transcript rebuilds ignore collapseCompactedHistory, and a remote header-only first fetch must leave the viewer on No messages yet.. Command: bun test packages/coding-agent/test/agent-hub-advisor-scroll.test.ts packages/coding-agent/test/compaction.test.ts packages/coding-agent/test/modes/utils/render-initial-messages.test.ts failed before the fix with those assertions.

Cause

AgentTranscriptViewer.#refresh keyed local files only by size/mtime and called ChatTranscriptBuilder.rebuild() after reading and parsing the whole session file on every change. UiHelpers.renderInitialMessages() and InteractiveMode.rebuildChatFromMessages() always requested the full inline transcript from AgentSession.buildTranscriptSessionContext(), so compacted history stayed on the live render surface. FileSessionStorage.writeTextSync() rewrote in place, making same-path historical rewrites hard for a tailing viewer to distinguish from appends.

Fix

  • Added ChatTranscriptBuilder.append() and switched local/remote transcript refreshes to append parsed JSONL tails when file identity and sentinels are stable.
  • Fall back to full transcript rebuild on rotation, truncation, same-size rewrites, or sentinel drift; synchronous session rewrites now replace file identity on POSIX so tailers see historical changes.
  • Added collapseCompactedHistory for transcript contexts and wired live chat rebuilds to use the compacted display tail while full transcript contexts remain available by default.
  • Fixed the remote empty/header-only placeholder path and added focused regressions for append tailing, compaction collapse, remote loading, and rewrite identity.

Verification

Passed: bun test packages/coding-agent/test/agent-hub-advisor-scroll.test.ts packages/coding-agent/test/compaction.test.ts packages/tui/test/resize-viewport-defer.test.ts packages/coding-agent/test/modes/utils/render-initial-messages.test.ts packages/coding-agent/test/session-storage.test.ts && bun check. Fixes #3258

Tail appended transcript JSONL instead of rebuilding rendered history on every poll, collapse compacted history for live chat rendering, and replace synchronous session rewrites so tailers detect historical changes.

Fixes #3258
@github-actions github-actions Bot added the vouched Passed the vouch gate label Jun 22, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3bcbf1515d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +254 to +257
size: stat.size,
mtimeMs: stat.mtimeMs,
offset: stat.size,
pending: "",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reset local tail offset to the bytes actually read

When the session file grows between the earlier statSync and the full readFileSync, the rebuild parses the newly appended bytes from data, but the tail state records size/offset from the older stat.size. On the next poll stat.size > state.size, so #appendLocal reads from the stale offset and appends entries that were already rendered, duplicating transcript rows for agents that append while the viewer is opening or doing a full rebuild. Re-stat after the read or base the recorded offset on data.byteLength.

Useful? React with 👍 / 👎.

Comment on lines 312 to 315
if (result.newSize < fromByte) {
// Host transcript rotated/truncated — restart from 0.
this.#remoteBytes = 0;
this.#remoteEntries = [];
this.#fetchRemote();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear remote transcript before restarting after rotation

In the collab-guest path, a host-side transcript truncation/rotation only resets the byte cursor before fetching from byte 0 again. Because this change now appends parsed remote entries directly into the existing ChatTranscriptBuilder, the refetched transcript is rendered after the stale pre-rotation rows instead of replacing them; the old #remoteEntries = [] plus rebuild behavior no longer exists. Clear the builder/model state before the restart so rotated host transcripts do not show duplicated/stale history.

Useful? React with 👍 / 👎.

If the session file grew between the earlier statSync and readFileSync inside #loadLocalFull, the rebuild rendered the appended bytes but the tail state recorded the pre-race stat.size. The next poll's #appendLocal would then read from that stale offset and re-render the bytes already in the rebuild, duplicating rows for agents that append during a viewer open or full rebuild. Anchor size/offset to data.byteLength and re-stat for mtime/identity so the post-read clock matches what's on disk.

Fixes #3258
@roboomp

roboomp commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Good catch — fixed in e5de11c.

  • #loadLocalFull now anchors state.size/state.offset to data.byteLength and re-stats for dev/ino/mtimeMs, so a race-window append never gets re-rendered by the next poll's #appendLocal.
  • Added a regression test that stubs fs.statSync to return a pre-race size while fs.readFileSync injects an appended entry mid-call, then asserts the race-window row renders exactly once after a follow-up append.

The append-based collab-guest path only reset the byte cursor on host transcript rotation/truncation, so the refetched bytes stacked on top of the stale pre-rotation rows still in the ChatTranscriptBuilder. Drop the builder and model state before refetching from byte 0 so rotated host transcripts replace stale history instead of duplicating it.

Fixes #3258
@roboomp

roboomp commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Good catch — fixed in e65b8e4.

  • #fetchRemote rotation branch now resets #hasRemoteData/#model and rebuilds the transcript with [] before recursing, so the post-rotation refetch replaces stale rows instead of stacking under them.
  • Added a regression test that drives a three-phase remote mock (initial → rotated → post) and asserts the pre-rotation marker disappears once the refetched content lands.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e65b8e4cc7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +264 to +267
size: data.byteLength,
mtimeMs: post.mtimeMs,
offset: data.byteLength,
pending: "",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve incomplete trailing JSONL after full rebuilds

When a full rebuild reads the session while another process is midway through appending a large JSONL entry, parseSessionEntries(data.toString(...)) ignores the incomplete trailing line, but the new tail state advances offset to data.byteLength and clears pending. The next poll then reads only the remainder of that same line, so the completed entry is malformed and the message can disappear from the transcript until a later rewrite/rotation forces another full read. This affects local agent/advisor viewers opened during an in-progress append; keep the bytes after the last newline as pending and only advance the cursor past complete lines.

Useful? React with 👍 / 👎.

When a full rebuild caught the session file mid-append, #loadLocalFull cleared the pending buffer but advanced offset past the headless trailing bytes. The next poll then read only the completion bytes, parsed a headless fragment, and silently dropped the completed entry. Carry the bytes after the last newline as pending so the completion concatenates with them on the next #appendLocal pass.

Fixes #3258
@roboomp

roboomp commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Good catch — fixed in 6bc0e56.

  • #loadLocalFull now splits the read buffer on its last newline, parses only the complete prefix, and seeds state.pending with the headless trailing bytes so the next #appendLocal concatenates the completion onto the buffered fragment instead of parsing it in isolation.
  • Added a regression test that writes a complete entry plus a headless trailing line, opens the viewer, then appends a single \n and asserts the previously partial entry surfaces on the next poll.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6bc0e56c20

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

};

if (options?.transcript) {
if (options?.transcript && !options.collapseCompactedHistory) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve kept turns in collapsed remote compactions

With collapseCompactedHistory: true, transcript mode now falls through into the LLM compaction branch. For OpenAI remote compactions (preserveData.openaiRemoteCompaction.replacementHistory), that branch skips the firstKeptEntryId-to-compaction messages behind if (!remoteReplacementHistory), so initial render/rebuild of a remotely compacted session shows only the summary and messages after compaction, hiding the recent kept turns that were visible before this change. The collapsed display path should still emit those kept SessionEntry messages even when the provider payload is used for LLM replay.

Useful? React with 👍 / 👎.

if (state.path !== sessionFile || state.dev !== stat.dev || state.ino !== stat.ino || stat.size < state.size)
return false;
for (const sentinel of state.sentinels) {
const current = readFileRangeSync(sessionFile, sentinel.offset, sentinel.bytes.byteLength);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle sentinel read failures during refresh

On a local viewer, after statSync succeeds, #canAppendLocal opens the file again for each sentinel without any catch. If the session file is deleted or rotated in that small window, fs.openSync throws out of #refresh/the poll timer instead of clearing or rebuilding the transcript, which can take down the TUI while an agent/session is being removed. Treat sentinel read failures as non-appendable or missing rather than letting them escape.

Useful? React with 👍 / 👎.

});

it("replaces the file identity so transcript tailers detect rewrites", async () => {
const { FileSessionStorage } = await import("@oh-my-pi/pi-coding-agent/session/session-storage");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use a top-level import for FileSessionStorage

This new test repeats await import(...) for a statically known module. The root AGENTS.md Code Quality rule says “NEVER use inline imports” and to use top-level imports instead; there is no runtime-selected specifier here, so this should be a normal FileSessionStorage import to keep the changed test within the package rules.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

vouched Passed the vouch gate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: TUI performance degrades with large sessions

1 participant