feat: add provider in-flight request limits#3216
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e0b808f90c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const stat = await fs.stat(path.join(dir, "info.json")); | ||
| return Date.now() - stat.mtimeMs > staleMs; | ||
| } catch (error) { | ||
| if (isEnoent(error)) return false; |
There was a problem hiding this comment.
Reap half-created lease directories
If a process is killed, or writeProviderInFlightInfo() throws, after fs.mkdir(leaseDir) creates a lease directory but before info.json is written, this ENOENT path marks that lease as not stale forever. cleanupProviderInFlightLeases() then counts the orphan directory as active on every pass, so a provider with a limit such as 1 can block all future requests until the temp directory is manually deleted; either remove the lease directory on write failure or fall back to the lease directory mtime when info.json is missing.
Useful? React with 👍 / 👎.
roboomp
left a comment
There was a problem hiding this comment.
Thanks for the careful implementation — the lock/lease/heartbeat/stale-detection story is genuinely well thought out (token-checked release, pid + timestamp dual staleness, watcher + fallback timer, abort wired through the whole chain).
Ranked P2 because this is an opt-in feature that ships permanent new surface area (StreamOptions.maxInFlightRequests in pi-ai, providers.maxInFlightRequests setting, a module-level singleton + ~400 LOC FS limiter, and a hidden OMP_PROVIDER_INFLIGHT_DIR env hatch). No linked issue and no prior maintainer signal that this approach is preferred over an external gateway/proxy, so it needs an explicit sign-off, not a quiet merge.
Headline questions for the maintainer:
- Lease root:
os.tmpdir() + sha256(configRoot)vs the documented "shared across local OMP processes using the same config root". Putting leases undergetConfigRootDir()directly matches the doc and survives sandboxed/containerTMPDIRdivergence. - API placement: limiter is in
pi-aibut scoping usesgetConfigRootDir()/os.tmpdir()— coding-agent concerns leaking into the SDK. Worth deciding whether the SDK option stays or this becomes a coding-agent-only feature. - UX: queued requests are invisible (no log, no event on
outer) — looks identical to a hung provider.
Smaller findings inline (missed-wakeup race adding up to 5s queue latency, ambiguous setting description, the new settings-manager test doesn't actually exercise the hook). Convention-wise: Bun APIs / #private / no any / no mock.module() all clean; the new CHANGELOG entries land in a released section but fix-changelogs will normalize that.
| const explicit = providerInFlightRootOverride ?? process.env.OMP_PROVIDER_INFLIGHT_DIR; | ||
| if (explicit) return explicit; | ||
| const configRootHash = crypto.createHash("sha256").update(getConfigRootDir()).digest("hex").slice(0, 16); | ||
| return path.join(os.tmpdir(), "omp-provider-inflight", configRootHash); |
There was a problem hiding this comment.
should-fix (maintainer call). The PR body advertises "shared across local OMP processes using the same config root", but the lease root is os.tmpdir()/omp-provider-inflight/<sha256(getConfigRootDir())[0..16]>. Two OMP processes that share a config root but observe different os.tmpdir() values (sandboxed runs, containers with bind-mounted home but isolated /tmp, per-process TMPDIR overrides) silently skip the shared limit. Putting the directory under getConfigRootDir() (e.g. <configRoot>/run/inflight/<provider>) matches the documented scope and survives those environments — the hash + os.tmpdir() combination only avoids cross-user collisions on shared /tmp, which is already implied by the per-user config root. Worth a maintainer decision before the SDK surface ships.
| if (filename === ".wakeup" || filename === null) { | ||
| finish(resolve); | ||
| } | ||
| }); |
There was a problem hiding this comment.
should-fix. fsSync.watch only fires for changes after the watcher attaches, so the window between tryAcquireProviderInFlightLease returning null (line 380) and fsSync.watch attaching here is a missed-wakeup hole: if another process releases its lease and writes .wakeup during that gap, this waiter sleeps the full PROVIDER_INFLIGHT_SIGNAL_FALLBACK_MS (5s) before retrying, even though a slot is already free. Two cheap mitigations: (1) stat the .wakeup file after attaching the watcher and resolve immediately if its mtime is newer than the moment cleanup finished; (2) shorten the fallback timer to ~250ms so worst-case latency is bounded. Without one of these, a single missed signal costs every queued request 5 extra seconds.
| await release(); | ||
| }; | ||
| try { | ||
| release = await acquireProviderInFlightSlot(model.provider, limit, options?.signal); |
There was a problem hiding this comment.
should-fix (UX). A queued request is fully invisible — withProviderInFlightLimit blocks at acquireProviderInFlightSlot before dispatch(), so the consumer sees outer silent for arbitrarily long while peers hold leases. No logger.debug, no synthetic start/status event on outer, no surface in the TUI. From the operator's seat this looks identical to a hung provider. At minimum a logger.debug("provider in-flight limit blocked", { provider, limit }) after the first non-trivial wait would make this diagnosable; ideally the wrapper would push a synthetic start (or a new "queued" event) on outer so the TUI can render a "queued behind N requests" indicator. As-is, the only signal a user has that the limiter is engaged is the absence of activity.
| } | ||
|
|
||
| function providerInFlightRoot(): string { | ||
| const explicit = providerInFlightRootOverride ?? process.env.OMP_PROVIDER_INFLIGHT_DIR; |
There was a problem hiding this comment.
nit. OMP_PROVIDER_INFLIGHT_DIR is an undocumented escape hatch — no comment here, nothing in the changelog or settings description. Either drop it (the __providerInFlightForTesting.setRoot testing seam already covers the test use case) or document it (jsdoc on this function + a line in the schema description), so users who hit os.tmpdir() issues in the wild know it exists.
| group: "Services", | ||
| label: "Max In-Flight Requests", | ||
| description: | ||
| 'Maximum concurrent LLM requests per model source/provider tab, shared across local OMP processes with this config root. Edit as JSON, for example {"openai":2}; omitted sources are unlimited.', |
There was a problem hiding this comment.
nit. "Maximum concurrent LLM requests per model source/provider tab" is ambiguous — keys are matched against model.provider in resolveProviderInFlightLimit, not "source" or "tab". Concretely the JSON keys must be provider ids like openai, anthropic, google-vertex (whatever model.provider returns for the user's bound models), not source names or settings-tab names. Suggest: Maximum concurrent LLM requests per provider id (e.g. "openai", "anthropic"), shared across local OMP processes with this config root. Edit as JSON, for example {"openai":2}; omitted providers are unlimited.
| settings.set("providers.maxInFlightRequests", { openai: 4 }); | ||
|
|
||
| expect(settings.get("providers.maxInFlightRequests")).toEqual({ openai: 1 }); | ||
| }); |
There was a problem hiding this comment.
nit. This test is titled "uses the effective merged value when configuring hooks" but never observes a hook — it only asserts that runtime overrides win over settings.set, which is already covered by the override-precedence tests. If the intent is to defend the providers.maxInFlightRequests hook contract, spy on configureProviderMaxInFlightRequests (or pass an explicit hook seam) and assert the limiter saw the merged value after set(). As written, it doesn't defend the new wiring at all.
| * providers are unlimited. Non-chat provider APIs that bypass stream helpers | ||
| * are not covered. | ||
| */ | ||
| maxInFlightRequests?: Record<string, number>; |
There was a problem hiding this comment.
should-fix (maintainer call). This adds a permanent public field on StreamOptions plus a module-level singleton (configuredProviderMaxInFlightRequests) and a 400-line FS-based lock/lease subsystem in stream.ts. Two things worth a maintainer decision before this lands:
- Whether the limiter belongs in
pi-aiat all, or only incoding-agent(the limiter is OMP-process-scoped viagetConfigRootDir()/os.tmpdir(), which is a coding-agent concern, not a general SDK concern — pi-ai consumers embedding the library standalone get a surprise FS root keyed off whatevergetConfigRootDir()resolves to). - Whether the singleton + per-call option dual mechanism is acceptable, or if the field should be required (no singleton) so embedders aren't subject to whichever Settings instance fired the hook last in a multi-Settings (subagent /
cloneForCwd) setup.
Either answer is defensible; both lock in API surface that's hard to walk back.
e0b808f to
886eb0c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 886eb0cf5a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
|
|
||
| function providerInFlightDir(provider: string): string { | ||
| return path.join(providerInFlightRoot(), encodeURIComponent(provider)); |
There was a problem hiding this comment.
Hash provider IDs before using them as paths
For custom providers, names are accepted from configuration/runtime registration without rejecting . or .., but encodeURIComponent() leaves those strings unchanged. When a limit is configured for such a provider, path.join(root, encodeURIComponent(provider)) resolves to the limiter root or its parent, so cleanup/signaling can count or remove unrelated provider lease directories and make the cap block incorrectly; use a path-safe opaque segment such as a hash/base64url encoding instead of the provider id itself.
Useful? React with 👍 / 👎.
886eb0c to
121313a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 121313a481
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (await isProviderInFlightLockStale(lockDir)) { | ||
| await releaseProviderInFlightLock(lockDir); | ||
| await signalProviderInFlightWaiters(provider); |
There was a problem hiding this comment.
Avoid deleting a fresh limiter lock
When two waiters observe the same stale .lock, this unguarded removal can run after another waiter has already deleted the stale directory and acquired a new lock with fresh info.json. Because the removal is not tied to the stale lock identity, the late waiter can delete the fresh owner’s lock and then both waiters proceed through lease counting/creation concurrently, so a provider cap such as 1 can be exceeded after a crashed or old lock; re-read/verify the stale token or use an atomic steal before removing it.
Useful? React with 👍 / 👎.
121313a to
996e6f6
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 996e6f6afb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const dir = providerInFlightDir(provider); | ||
| await fs.mkdir(dir, { recursive: true }); | ||
| const active = await cleanupProviderInFlightLeases(dir); | ||
| if (active >= limit) return null; |
There was a problem hiding this comment.
Avoid wakeups when no slot was freed
When the provider is saturated, this return null still runs the finally block, and the lock releaser always writes .wakeup. With two or more queued requests for the same capped provider, each failed availability check wakes another waiter, which also finds active >= limit and writes another wakeup, causing continuous filesystem lock/watch churn until a real lease is released; only signal waiters when a lease is actually created/released or stale state was removed.
Useful? React with 👍 / 👎.
996e6f6 to
08b1304
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08b13049e8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ### Added | ||
|
|
||
| - Added a per-provider in-flight request limiter for LLM streams, shared across local OMP processes and configurable by callers with `maxInFlightRequests`. |
There was a problem hiding this comment.
Move ai changelog entry to Unreleased
This addition is inside the already released 16.1.8 section instead of ## [Unreleased]; AGENTS.md says new changelog entries must go under Unreleased and released sections are immutable. Leaving it here rewrites historical release notes and can cause the next release notes to miss the new limiter change.
Useful? React with 👍 / 👎.
| ### Added | ||
|
|
||
| - Added `providers.maxInFlightRequests` to cap concurrent LLM requests per provider across local OMP processes from settings. | ||
|
|
||
| ### Changed |
There was a problem hiding this comment.
Move coding-agent changelog additions to Unreleased
This insertion is also in the released 16.1.8 block rather than ## [Unreleased], and the following browser-migration bullets are unrelated to this change. AGENTS.md says new entries go under Unreleased and released sections are immutable, so this corrupts historical release notes and risks omitting the provider-limit entry from the next release.
Useful? React with 👍 / 👎.
08b1304 to
4066721
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 406672195d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!value || typeof value !== "object" || Array.isArray(value)) return {}; | ||
| const normalized: Record<string, number> = {}; | ||
| for (const [provider, rawLimit] of Object.entries(value)) { | ||
| if (typeof rawLimit !== "number" || !Number.isFinite(rawLimit) || rawLimit <= 0) continue; |
There was a problem hiding this comment.
Reject invalid configured provider limits
When providers.maxInFlightRequests comes from hand-edited config.yml, project settings, or an overlay, invalid entries such as openai: "2" or anthropic: 0 reach this normalizer and are silently dropped, so the request cap is not enforced even though the user configured one. The CLI/UI paths call validateProviderMaxInFlightRequests, but the loaded-settings path is normalized here before being passed to streams, so raw config typos bypass the intended rejection; validate the loaded value (or surface a settings error) instead of continueing.
Useful? React with 👍 / 👎.
4066721 to
b1deca6
Compare
What
Adds a user-configurable per-provider max in-flight LLM request limit via providers.maxInFlightRequests.
Why
Prevents OMP sessions/subagents from stampeding a provider when users want local concurrency control, without requiring a separate local proxy or gateway.
This is intentionally a local cooperative limiter for OMP LLM stream calls. Organization-wide or cross-tool enforcement still belongs in an external gateway/proxy.
Testing
bun --cwd=packages/ai run check
bun --cwd=packages/coding-agent run check
bun test packages/ai/test/provider-inflight.test.ts
bun test packages/coding-agent/test/config-cli.test.ts
bun test packages/coding-agent/test/modes/components/settings-layout.test.ts
bun test packages/coding-agent/test/settings-manager.test.ts -t "provider in-flight request limits|provider request limits|defaults provider"
bun check passes
Tested locally
CHANGELOG updated (if user-facing)