smp: fix proxy reconnection to relay after restart#1806
Open
shumvgolove wants to merge 3 commits into
Open
Conversation
Reproduces the proxy failing to reconnect to a destination relay when the sender disconnects mid-connection (empty session var left in smpClients).
getSessVar inserts an empty session var that the connect path fills with putTMVar. If the connecting thread is killed by an async exception before that (e.g. a proxy worker on client disconnect, or an agent worker on cancel), the empty var was left in the map forever and every later request for that server blocked on it until timing out (permanent PCEResponseTimeout). Add clearSessVarOnInterrupt and run it via onException at the SMP proxy (newSMPClient), agent (newProtocolClient, newProxiedRelay) and ntf push worker (getOrCreatePushWorker) connect sites: on interrupt before fill, release waiters with an error and drop the var so the next request reconnects.
UtilTests: tryAllErrors rethrows ThreadKilled/StackOverflow (the mechanism that skips putTMVar). SMPProxyTests: agent client reconnection after a cancelled connect, plus a control proving the stalling relay alone does not cause the failure; refine the relay reconnection tests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
An SMP proxy permanently stops reconnecting to a destination relay after the relay restarts. The logs show repeated
PCEResponseTimeoutfor that relay, and only restarting the proxy server recovers it.Cause
A
PRXYrequest makes the proxy open a connection to the relay in a worker forked from the sender's client. The worker inserts an empty session var intosmpClientsand then blocks in the connection/handshake. If the sender disconnects while that connect is in flight, the worker is killed by an async exception before the session var is ever filled.Nothing removes an empty session var, so every later request to that relay waits on it until the connection timeout and fails with
PROXY (BROKER TIMEOUT)- forever, even once the relay is healthy again.