fix(opencode): make global event stream heartbeat configurable + exponential reconnect#33646
fix(opencode): make global event stream heartbeat configurable + exponential reconnect#33646daysus wants to merge 1 commit into
Conversation
…ponential reconnect Adds server.event_stream.heartbeat_ms config to tune the SSE heartbeat interval (default 10000ms, range 1000-60000). This lets users on flaky networks (AV, proxies, idle TCP) keep the global event stream alive. Renderer now reconnects with exponential backoff (500ms..30s, +/-25% jitter) instead of a fixed 250ms, which hammers the server when the stream dies in a tight loop. The attempt counter resets on every event. Together these changes survive the silent SSE disconnects observed on Windows + Electron + sidecar architectures (see anomalyco#30597). Refs anomalyco#30597
|
Hey! Your PR title Please update it to start with one of:
Where See CONTRIBUTING.md for details. |
|
The following comment was made by an LLM, it may be inaccurate: One potentially related PR found: PR #29542: fix(opencode): keep SSE streams alive after serialization errors This PR addresses SSE stream resilience, which is related to the current PR's focus on global event stream reliability. Both deal with keeping SSE connections healthy, though #29542 focuses on serialization errors while this PR targets heartbeat configuration and reconnection strategy for Windows/flaky networks. |
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
Desktop Electron testing resultsHeartbeat fix verified on OpenCode Desktop (v1.17.9, Windows)The new OpenCode Desktop (@opencode-aidesktop) bundles the server inside the Electron process via a utility worker thread (sidecar.js -> node-Cyb4gY-5.js), not as a separate opencode-cli.exe binary. Key findings:
Testing method: Extracted app.asar, patched heartbeat in bundled JS, repacked asar, launched desktop. The original app.asar was backed up as app.asar.original. Recommendation: The Electron utility process crash should be tracked as a separate issue. This PR addresses the SSE heartbeat + reconnect behavior, which is confirmed working. |
Final testing report - Windows Electron Desktop (v1.17.9)Architecture discoveryThe new OpenCode Desktop (@opencode-aidesktop) does NOT use a separate opencode-cli.exe binary. Instead, it bundles the server inside the Electron process via a utility worker thread (out/main/sidecar.js -> out/main/chunks/node-Cyb4gY-5.js). What was testedThree layers of patching were applied to �pp.asar:
Results
Root cause analysisThe server enters a "zombie" state where:
This zombie state is undetectable from outside the process - no health check endpoint can distinguish it from a healthy server. The crash is at the native/Effect runtime level, not caught by any JavaScript handler. ConclusionThis PR correctly fixes the SSE heartbeat disconnect issue. The zombie server state is a separate pre-existing bug (unrelated to heartbeat interval) that affects the Electron desktop's utility worker process. The standalone opencode-cli.exe binary does not exhibit the zombie state, suggesting it's specific to how the Electron utilityProcess.fork() manages the Node.js runtime. Recommendation: Merge this PR for the SSE fix. Track the worker zombie issue separately - it affects the stock v1.17.9 desktop and is reproducible with zero code changes. |
Closes #30597
Issue for this PR
Closes #30597
Type of change
What does this PR do?
OpenCode Desktop on Windows + Electron repeatedly emits
INFO: global event disconnectedand the sidecar session becomes unresponsive after a few minutes of use (see #30597). The renderer then has no live event stream until the user restarts the app.Two layers contribute:
Server hardcodes a 10s SSE heartbeat in the global event handler. On Windows this is too sparse to survive aggressive proxy / antivirus / idle-TCP timeouts that silently close long-lived SSE connections. The server logs the disconnect and the renderer is left without an event stream.
Renderer reconnects with a flat 250ms delay. When the server is the slow side, this turns into a tight reconnect loop that hammers the server without giving it time to recover.
This PR:
server.event_stream.heartbeat_msconfig (range 1000-60000, default 10000) so users on flaky networks can tune heartbeat density. Mirrors existingserver.*config keys. Schema lives inpackages/core/src/v1/config/server.ts.heartbeatMsfrom config instead of the hardcoded 10s (packages/opencode/src/server/routes/instance/httpapi/handlers/global.ts).packages/app/src/context/server-sdk.tsx).packages/opencode/test/server/httpapi-event.test.ts).Why this works: a denser heartbeat keeps idle proxies from closing the SSE socket, and the exponential backoff prevents the renderer from starving the server when recovery is genuinely slow. The two are independent but complementary.
How did you verify your code works?
bun typecheckclean inpackages/coreandpackages/opencode.packages/apphas a pre-existing parse error insrc/custom-elements.d.tsunrelated to this change.honours configured heartbeat intervalasserts a heartbeat arrives in under 1500ms whenheartbeatMs: 1000is set.I was not able to run the full
bun testsuite locally because of an unrelatedopenai@6.39.1packaging issue (missingversion.mjsin node_modules) that exists on this checkout regardless of my changes. The PR CI on GitHub Actions will validate against a clean environment.Screenshots / recordings
Not a UI change.
Checklist