fix(server-v3): Prevent evicting in-use sessions mid-request#2286
fix(server-v3): Prevent evicting in-use sessions mid-request#2286akeimach wants to merge 6 commits into
Conversation
🦋 Changeset detectedLatest commit: 9edfcdb The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
3 issues found across 6 files
Confidence score: 2/5
- In
packages/server-v3/src/lib/withSession.ts, the session can be released beforefncompletes, so an in-flight handler may keep using a Stagehand instance that has already become evictable, leading to flaky request failures under load — gate session close on true abort and ensure the handler is canceled/finished before release. - In
packages/server-v3/src/lib/InMemorySessionStore.ts, the lazy V3 init path awaitsstagehand.init()before the session is pinned, which creates a race where the session can be evicted during initialization and return/leak an untracked instance — pin first, then run init (or otherwise hold a non-evictable reference through init) before merging. - In
packages/server-v3/src/lib/InMemorySessionStore.ts,releaseSession()refreshes TTL even for unmatched/double releases, which can keep idle sessions alive longer than intended and distort eviction behavior over time — only refresh TTL when a corresponding pin was actually released.
Architecture diagram
sequenceDiagram
participant Client as FastifyRequest
participant WS as withSession
participant Store as InMemorySessionStore
participant Evict as LRU Eviction (internal)
participant TTL as TTL Cleanup (internal)
participant V3 as V3Instance
Note over Client,V3: Request handling with session pinning
Client->>WS: request (sessionId, ctx, fn)
WS->>Store: getOrCreateStagehand(sessionId, ctx)
Note over Store: Increment node.inUse (pin)
Store-->>WS: stagehand (V3)
WS->>WS: attach 'close' listener on request.raw
WS->>V3: fn(stagehand) – long-running agent logic
Note over V3: Agent can take 60s+
alt Client aborts mid-handler
Client-->>WS: abort / socket close
WS->>Store: releaseSession(sessionId)
Note over Store: Decrement inUse (clamp ≥0)<br/>Refresh TTL
Store-->>WS: done
WS->>WS: released flag set
else Handler completes normally
V3-->>WS: result
WS->>WS: finally block
WS->>Store: releaseSession(sessionId)
Note over Store: Decrement inUse (clamp ≥0)<br/>Refresh TTL
Store-->>WS: done
end
WS-->>Client: response
Note over Client,V3: Background eviction/expiry (respects pinning)
loop LRU eviction (on capacity exceeded)
Evict->>Store: find LRU node
Store->>Store: walk from first, skip inUse > 0
alt found non‑in‑use node
Store->>Evict: evict that node
Evict->>Store: deleteSession
else all nodes in use
Evict-->>Store: skip eviction
end
end
loop TTL cleanup (periodic)
TTL->>Store: for each node
Store->>Store: check expiry <= now AND inUse === 0
alt expired and not in use
Store->>Store: deleteSession
else in use
Store-->>TTL: skip
end
end
rect rgb(240, 240, 240)
Note over Client,V3: Explicit endSession overrides pinning
Client->>Store: endSession(sessionId)
Store->>V3: close()
Store->>Store: remove node (ignores inUse)
end
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
e156c37 to
f3cb355
Compare
There was a problem hiding this comment.
1 issue found across 7 files
Confidence score: 3/5
- In
packages/server-v3/src/lib/withSession.ts,releaseSessionerrors are currently swallowed, so pinned sessions can remain markedinUseand gradually leak capacity under load; this could cause harder-to-diagnose performance degradation or session exhaustion after merge — log/emit release failures (and ideally fail or retry release paths) before merging so leaks are detectable.
Architecture diagram
sequenceDiagram
participant Client as API Client
participant Handler as Request Handler (stream.ts)
participant WithSession as withSession()
participant Store as InMemorySessionStore
participant V3 as V3 Stagehand
participant LRU as LRU Eviction
participant TTL as TTL Cleanup
Note over Client,TTL: NEW: Session Pinning – prevent eviction/expiry while in-flight
Client->>Handler: HTTP Request (sse/start)
Handler->>WithSession: call withSession(sessionId, ctx, handlerFn)
WithSession->>Store: getOrCreateStagehand(sessionId, ctx)
alt Session not found / expired
Store-->>WithSession: throw
WithSession-->>Handler: error
else Session exists
Store->>Store: node.inUse += 1 NEW: pin
opt Lazy initialization (V3 not yet created)
Store->>V3: new Stagehand()
Store->>V3: init()
alt init fails
Store->>Store: node.inUse -= 1 NEW: undo pin
Store->>V3: close()
Store-->>WithSession: throw
end
end
Store-->>WithSession: V3 instance
end
WithSession->>WithSession: call handlerFn(stagehand)
Note over WithSession,TTL: Handler runs – session is pinned
TTL->>Store: cleanupExpired() (periodic)
Store->>Store: check node.expiry <= now AND node.inUse === 0
alt node.inUse > 0
Store-->>TTL: skip expiry (pinned) NEW
else
Store-->>TTL: delete expired session
end
LRU->>Store: evictLru() (on capacity)
Store->>Store: walk from first, skip nodes with inUse > 0
alt no evictable node found
Store-->>LRU: no-op NEW: skip eviction
else
Store->>Store: deleteSession(evictedNode)
end
Handler->>WithSession: handlerFn settles (success or error)
WithSession->>Store: releaseSession(sessionId) NEW
Store->>Store: if node.inUse > 0 then node.inUse -= 1 and refresh expiry
alt multiple concurrent holders
Store->>Store: inUse still > 0 – remains pinned
else last holder releases
Store->>Store: session now eligible for eviction/expiry
end
opt unmatched release (inUse already 0)
Store->>Store: no-op (clamp at 0, no TTL refresh) NEW
end
WithSession-->>Handler: result (propagates error)
Handler-->>Client: Response (streamed or JSON)
Note over Client,TTL: endSession bypasses pin
alt Explicit endSession
Client->>Handler: (end session request)
Handler->>Store: endSession(sessionId)
Store->>Store: delete node regardless of inUse
Store->>V3: close()
Store-->>Handler: done
end
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
There was a problem hiding this comment.
No issues found across 7 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant Client as HTTP Client
participant API as API Route Handler
participant withS as withSession Wrapper
participant Store as InMemorySessionStore
participant Stagehand as V3 Stagehand Instance
participant Cleanup as TTL/LRU Cleanup
Note over Client,Cleanup: Session Lifecycle with Pin/Release
Client->>API: POST /start or streaming request
API->>withS: withSession(sessionId, ctx, handler)
withS->>Store: getOrCreateStagehand(sessionId, ctx)
Store->>Store: Lookup node in LRU map
alt Session not found or expired
Store-->>withS: throw Error
withS-->>API: Error propagates
API-->>Client: Error response
else Session found
Store->>Store: node.inUse += 1 (PIN)
Store->>Stagehand: Lazy init() if null
alt init() fails
Stagehand-->>Store: Error
Store->>Store: node.inUse -= 1 (unpin)
Note over Store: Node stays in cache for retry
Store-->>withS: throw Error
withS-->>API: Error propagates
API-->>Client: Error response
else init() succeeds
Stagehand-->>Store: V3 instance ready
Store-->>withS: Return V3 instance
withS->>withS: Run handler(stagehand)
Note over withS: Handler runs (potentially 60s+)<br/>Session remains pinned
alt Handler succeeds
withS->>Store: releaseSession(sessionId) in finally
Store->>Store: node.inUse -= 1 (if > 0)
Store->>Store: Refresh TTL expiry
Store-->>withS: OK
withS-->>API: Handler result
API-->>Client: Success response
else Handler throws
withS->>Store: releaseSession(sessionId) in finally
Store->>Store: node.inUse -= 1
Store->>Store: Refresh TTL expiry
Store-->>withS: OK (or error logged)
withS-->>API: Error propagates
API-->>Client: Error response
end
opt Release failure
Note over withS,Store: console.error() records leak<br/>without clobbering handler result
end
end
end
Note over Store: Concurrent requests share session<br/>inUse is a refcount (not boolean)
par Concurrent request
Client->>API: Another request for same session
API->>withS: withSession(same sessionId, ctx, handler)
withS->>Store: getOrCreateStagehand(sessionId, ctx)
Store->>Store: node.inUse += 1 (now 2)
Store-->>withS: Return existing V3 instance
Note over withS,Store: Both requests run concurrently<br/>Session stays pinned until both release
end
Note over Store: TTL and LRU checks skip pinned sessions
loop Periodic / On-insert cleanup
Cleanup->>Store: cleanupExpired()
Store->>Store: Iterate items
alt node.inUse > 0
Store->>Store: Skip - session is pinned
else node.inUse === 0 && expired
Store->>Store: Delete session
end
end
loop On insert exceeding capacity
Cleanup->>Store: evictLru()
Store->>Store: Walk from least-recently-used
alt node.inUse > 0
Store->>Store: Skip, move to next
else node.inUse === 0
Store->>Store: Evict this session
end
Note over Store: If all sessions pinned, skip eviction
end
Note over Client,Cleanup: Explicit endSession bypasses pinning
Client->>API: endSession(sessionId)
API->>Store: endSession(sessionId)
Store->>Store: Close browser, delete node
Note over Store: endSession works even when inUse > 0<br/>(forceful cleanup, not regular eviction)
Store-->>API: OK
API-->>Client: Success
Note over Store: Unmatched release is a no-op
Store->>Store: releaseSession() with inUse === 0
Store->>Store: Return immediately, no TTL refresh
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
why
A long-running agent runs (60s+) could have their sessions ended early via LRU eviction or TTL cleanup. This surfaces as a "Stagehand session was closed" response on actions that have actually succeed.
The root cause was that the session store only refreshed a session's LRU/TTL at request start, which would cause long requests to reach eviction age while still running.
what changed
To fix, we can mark these long running sessions as "in use" to prevent it from expiring while in flight.
The new
inUseis a refcount (not a boolean) because concurrent requests can share one session.test plan
New tests/unit/sessionPinning.test.ts: eviction-skip under capacity, TTL-skip, concurrent holders + non-negative clamp, explicit-end-still-closes, and release-exactly-once-on-abort
Summary by cubic
Prevent in-use sessions from being evicted or TTL-expired mid-request, fixing intermittent “Stagehand session was closed” errors on long agent runs. Adds session pinning with a refcount and a
withSessionhelper, and fixes capacity downsizing to evict all excess entries safely.Bug Fixes
inUserefcount; TTL cleanup,hasSession, and LRU eviction skip pinned sessions (eviction walks past in-use nodes).getOrCreateStagehandpins before lazyinit()and unpins on failure; newreleaseSession(sessionId)decrements (clamped at zero) and refreshes TTL, ignoring unmatched releases.maxCapacity, evict before applying the new limit and do it sequentially so all excess entries are removed; pinned sessions are skipped, so the cache can briefly exceed capacity until requests finish. Updated streaming and session start to usewithSession(...), which releases exactly once after the handler settles and logs release failures viaconsole.error. Unit tests cover eviction/TTL skips, capacity downsizing, concurrent holders, unmatched releases, explicit end, release-once semantics, and release-failure logging.Migration
SessionStoreimplementations must addreleaseSession(sessionId)and follow the acquire/release contract.withSession(...)instead of manually holding a V3 instance; do not release on client disconnect.Written for commit 9edfcdb. Summary will update on new commits.