Skip to content

feat: add provider in-flight request limits#3216

Open
H4vC wants to merge 1 commit into
can1357:mainfrom
H4vC:h4vc/max-inflight-provider
Open

feat: add provider in-flight request limits#3216
H4vC wants to merge 1 commit into
can1357:mainfrom
H4vC:h4vc/max-inflight-provider

Conversation

@H4vC

@H4vC H4vC commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

What

Adds a user-configurable per-provider max in-flight LLM request limit via providers.maxInFlightRequests.

  • Limits are keyed by model.provider, matching /model provider/source IDs.
  • Limits apply to LLM stream/chat paths through stream(), streamSimple(), and completeSimple().
  • Limits are shared across local OMP processes using the same config root.
  • Queued requests wait for a cross-process filesystem wakeup signal
  • /settings shows the setting compactly as Unlimited until configured, then as provider: limit.
  • Invalid provider limit entries are rejected instead of silently ignored.

Why

Prevents OMP sessions/subagents from stampeding a provider when users want local concurrency control, without requiring a separate local proxy or gateway.

This is intentionally a local cooperative limiter for OMP LLM stream calls. Organization-wide or cross-tool enforcement still belongs in an external gateway/proxy.

Testing

  • bun --cwd=packages/ai run check

  • bun --cwd=packages/coding-agent run check

  • bun test packages/ai/test/provider-inflight.test.ts

  • bun test packages/coding-agent/test/config-cli.test.ts

  • bun test packages/coding-agent/test/modes/components/settings-layout.test.ts

  • bun test packages/coding-agent/test/settings-manager.test.ts -t "provider in-flight request limits|provider request limits|defaults provider"

  • bun check passes

  • Tested locally

  • CHANGELOG updated (if user-facing)

@github-actions github-actions Bot added the vouched Passed the vouch gate label Jun 21, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e0b808f90c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/ai/src/stream.ts
const stat = await fs.stat(path.join(dir, "info.json"));
return Date.now() - stat.mtimeMs > staleMs;
} catch (error) {
if (isEnoent(error)) return false;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reap half-created lease directories

If a process is killed, or writeProviderInFlightInfo() throws, after fs.mkdir(leaseDir) creates a lease directory but before info.json is written, this ENOENT path marks that lease as not stale forever. cleanupProviderInFlightLeases() then counts the orphan directory as active on every pass, so a provider with a limit such as 1 can block all future requests until the temp directory is manually deleted; either remove the lease directory on write failure or fall back to the lease directory mtime when info.json is missing.

Useful? React with 👍 / 👎.

@roboomp roboomp added cli CLI commands and arguments feat providers LLM provider-specific issues review:p2 sdk SDK and extension integration surface triaged labels Jun 21, 2026

@roboomp roboomp left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the careful implementation — the lock/lease/heartbeat/stale-detection story is genuinely well thought out (token-checked release, pid + timestamp dual staleness, watcher + fallback timer, abort wired through the whole chain).

Ranked P2 because this is an opt-in feature that ships permanent new surface area (StreamOptions.maxInFlightRequests in pi-ai, providers.maxInFlightRequests setting, a module-level singleton + ~400 LOC FS limiter, and a hidden OMP_PROVIDER_INFLIGHT_DIR env hatch). No linked issue and no prior maintainer signal that this approach is preferred over an external gateway/proxy, so it needs an explicit sign-off, not a quiet merge.

Headline questions for the maintainer:

  • Lease root: os.tmpdir() + sha256(configRoot) vs the documented "shared across local OMP processes using the same config root". Putting leases under getConfigRootDir() directly matches the doc and survives sandboxed/container TMPDIR divergence.
  • API placement: limiter is in pi-ai but scoping uses getConfigRootDir()/os.tmpdir() — coding-agent concerns leaking into the SDK. Worth deciding whether the SDK option stays or this becomes a coding-agent-only feature.
  • UX: queued requests are invisible (no log, no event on outer) — looks identical to a hung provider.

Smaller findings inline (missed-wakeup race adding up to 5s queue latency, ambiguous setting description, the new settings-manager test doesn't actually exercise the hook). Convention-wise: Bun APIs / #private / no any / no mock.module() all clean; the new CHANGELOG entries land in a released section but fix-changelogs will normalize that.

Comment thread packages/ai/src/stream.ts Outdated
const explicit = providerInFlightRootOverride ?? process.env.OMP_PROVIDER_INFLIGHT_DIR;
if (explicit) return explicit;
const configRootHash = crypto.createHash("sha256").update(getConfigRootDir()).digest("hex").slice(0, 16);
return path.join(os.tmpdir(), "omp-provider-inflight", configRootHash);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should-fix (maintainer call). The PR body advertises "shared across local OMP processes using the same config root", but the lease root is os.tmpdir()/omp-provider-inflight/<sha256(getConfigRootDir())[0..16]>. Two OMP processes that share a config root but observe different os.tmpdir() values (sandboxed runs, containers with bind-mounted home but isolated /tmp, per-process TMPDIR overrides) silently skip the shared limit. Putting the directory under getConfigRootDir() (e.g. <configRoot>/run/inflight/<provider>) matches the documented scope and survives those environments — the hash + os.tmpdir() combination only avoids cross-user collisions on shared /tmp, which is already implied by the per-user config root. Worth a maintainer decision before the SDK surface ships.

Comment thread packages/ai/src/stream.ts
if (filename === ".wakeup" || filename === null) {
finish(resolve);
}
});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should-fix. fsSync.watch only fires for changes after the watcher attaches, so the window between tryAcquireProviderInFlightLease returning null (line 380) and fsSync.watch attaching here is a missed-wakeup hole: if another process releases its lease and writes .wakeup during that gap, this waiter sleeps the full PROVIDER_INFLIGHT_SIGNAL_FALLBACK_MS (5s) before retrying, even though a slot is already free. Two cheap mitigations: (1) stat the .wakeup file after attaching the watcher and resolve immediately if its mtime is newer than the moment cleanup finished; (2) shorten the fallback timer to ~250ms so worst-case latency is bounded. Without one of these, a single missed signal costs every queued request 5 extra seconds.

Comment thread packages/ai/src/stream.ts
await release();
};
try {
release = await acquireProviderInFlightSlot(model.provider, limit, options?.signal);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should-fix (UX). A queued request is fully invisible — withProviderInFlightLimit blocks at acquireProviderInFlightSlot before dispatch(), so the consumer sees outer silent for arbitrarily long while peers hold leases. No logger.debug, no synthetic start/status event on outer, no surface in the TUI. From the operator's seat this looks identical to a hung provider. At minimum a logger.debug("provider in-flight limit blocked", { provider, limit }) after the first non-trivial wait would make this diagnosable; ideally the wrapper would push a synthetic start (or a new "queued" event) on outer so the TUI can render a "queued behind N requests" indicator. As-is, the only signal a user has that the limiter is engaged is the absence of activity.

Comment thread packages/ai/src/stream.ts Outdated
}

function providerInFlightRoot(): string {
const explicit = providerInFlightRootOverride ?? process.env.OMP_PROVIDER_INFLIGHT_DIR;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. OMP_PROVIDER_INFLIGHT_DIR is an undocumented escape hatch — no comment here, nothing in the changelog or settings description. Either drop it (the __providerInFlightForTesting.setRoot testing seam already covers the test use case) or document it (jsdoc on this function + a line in the schema description), so users who hit os.tmpdir() issues in the wild know it exists.

group: "Services",
label: "Max In-Flight Requests",
description:
'Maximum concurrent LLM requests per model source/provider tab, shared across local OMP processes with this config root. Edit as JSON, for example {"openai":2}; omitted sources are unlimited.',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. "Maximum concurrent LLM requests per model source/provider tab" is ambiguous — keys are matched against model.provider in resolveProviderInFlightLimit, not "source" or "tab". Concretely the JSON keys must be provider ids like openai, anthropic, google-vertex (whatever model.provider returns for the user's bound models), not source names or settings-tab names. Suggest: Maximum concurrent LLM requests per provider id (e.g. "openai", "anthropic"), shared across local OMP processes with this config root. Edit as JSON, for example {"openai":2}; omitted providers are unlimited.

settings.set("providers.maxInFlightRequests", { openai: 4 });

expect(settings.get("providers.maxInFlightRequests")).toEqual({ openai: 1 });
});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. This test is titled "uses the effective merged value when configuring hooks" but never observes a hook — it only asserts that runtime overrides win over settings.set, which is already covered by the override-precedence tests. If the intent is to defend the providers.maxInFlightRequests hook contract, spy on configureProviderMaxInFlightRequests (or pass an explicit hook seam) and assert the limiter saw the merged value after set(). As written, it doesn't defend the new wiring at all.

Comment thread packages/ai/src/types.ts
* providers are unlimited. Non-chat provider APIs that bypass stream helpers
* are not covered.
*/
maxInFlightRequests?: Record<string, number>;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should-fix (maintainer call). This adds a permanent public field on StreamOptions plus a module-level singleton (configuredProviderMaxInFlightRequests) and a 400-line FS-based lock/lease subsystem in stream.ts. Two things worth a maintainer decision before this lands:

  1. Whether the limiter belongs in pi-ai at all, or only in coding-agent (the limiter is OMP-process-scoped via getConfigRootDir()/os.tmpdir(), which is a coding-agent concern, not a general SDK concern — pi-ai consumers embedding the library standalone get a surprise FS root keyed off whatever getConfigRootDir() resolves to).
  2. Whether the singleton + per-call option dual mechanism is acceptable, or if the field should be required (no singleton) so embedders aren't subject to whichever Settings instance fired the hook last in a multi-Settings (subagent / cloneForCwd) setup.

Either answer is defensible; both lock in API surface that's hard to walk back.

@H4vC H4vC force-pushed the h4vc/max-inflight-provider branch from e0b808f to 886eb0c Compare June 21, 2026 22:58

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 886eb0cf5a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/ai/src/stream.ts Outdated
}

function providerInFlightDir(provider: string): string {
return path.join(providerInFlightRoot(), encodeURIComponent(provider));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hash provider IDs before using them as paths

For custom providers, names are accepted from configuration/runtime registration without rejecting . or .., but encodeURIComponent() leaves those strings unchanged. When a limit is configured for such a provider, path.join(root, encodeURIComponent(provider)) resolves to the limiter root or its parent, so cleanup/signaling can count or remove unrelated provider lease directories and make the cap block incorrectly; use a path-safe opaque segment such as a hash/base64url encoding instead of the provider id itself.

Useful? React with 👍 / 👎.

@H4vC H4vC force-pushed the h4vc/max-inflight-provider branch from 886eb0c to 121313a Compare June 21, 2026 23:12

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 121313a481

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/ai/src/stream.ts Outdated
Comment on lines +245 to +247
if (await isProviderInFlightLockStale(lockDir)) {
await releaseProviderInFlightLock(lockDir);
await signalProviderInFlightWaiters(provider);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid deleting a fresh limiter lock

When two waiters observe the same stale .lock, this unguarded removal can run after another waiter has already deleted the stale directory and acquired a new lock with fresh info.json. Because the removal is not tied to the stale lock identity, the late waiter can delete the fresh owner’s lock and then both waiters proceed through lease counting/creation concurrently, so a provider cap such as 1 can be exceeded after a crashed or old lock; re-read/verify the stale token or use an atomic steal before removing it.

Useful? React with 👍 / 👎.

@H4vC H4vC force-pushed the h4vc/max-inflight-provider branch from 121313a to 996e6f6 Compare June 21, 2026 23:35

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 996e6f6afb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/ai/src/stream.ts
const dir = providerInFlightDir(provider);
await fs.mkdir(dir, { recursive: true });
const active = await cleanupProviderInFlightLeases(dir);
if (active >= limit) return null;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid wakeups when no slot was freed

When the provider is saturated, this return null still runs the finally block, and the lock releaser always writes .wakeup. With two or more queued requests for the same capped provider, each failed availability check wakes another waiter, which also finds active >= limit and writes another wakeup, causing continuous filesystem lock/watch churn until a real lease is released; only signal waiters when a lease is actually created/released or stale state was removed.

Useful? React with 👍 / 👎.

@H4vC H4vC force-pushed the h4vc/max-inflight-provider branch from 996e6f6 to 08b1304 Compare June 21, 2026 23:46

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08b13049e8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/ai/CHANGELOG.md Outdated
Comment on lines +68 to +70
### Added

- Added a per-provider in-flight request limiter for LLM streams, shared across local OMP processes and configurable by callers with `maxInFlightRequests`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Move ai changelog entry to Unreleased

This addition is inside the already released 16.1.8 section instead of ## [Unreleased]; AGENTS.md says new changelog entries must go under Unreleased and released sections are immutable. Leaving it here rewrites historical release notes and can cause the next release notes to miss the new limiter change.

Useful? React with 👍 / 👎.

Comment thread packages/coding-agent/CHANGELOG.md Outdated
Comment on lines +95 to +99
### Added

- Added `providers.maxInFlightRequests` to cap concurrent LLM requests per provider across local OMP processes from settings.

### Changed

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Move coding-agent changelog additions to Unreleased

This insertion is also in the released 16.1.8 block rather than ## [Unreleased], and the following browser-migration bullets are unrelated to this change. AGENTS.md says new entries go under Unreleased and released sections are immutable, so this corrupts historical release notes and risks omitting the provider-limit entry from the next release.

Useful? React with 👍 / 👎.

@H4vC H4vC force-pushed the h4vc/max-inflight-provider branch from 08b1304 to 4066721 Compare June 21, 2026 23:55

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 406672195d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if (!value || typeof value !== "object" || Array.isArray(value)) return {};
const normalized: Record<string, number> = {};
for (const [provider, rawLimit] of Object.entries(value)) {
if (typeof rawLimit !== "number" || !Number.isFinite(rawLimit) || rawLimit <= 0) continue;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject invalid configured provider limits

When providers.maxInFlightRequests comes from hand-edited config.yml, project settings, or an overlay, invalid entries such as openai: "2" or anthropic: 0 reach this normalizer and are silently dropped, so the request cap is not enforced even though the user configured one. The CLI/UI paths call validateProviderMaxInFlightRequests, but the loaded-settings path is normalized here before being passed to streams, so raw config typos bypass the intended rejection; validate the loaded value (or surface a settings error) instead of continueing.

Useful? React with 👍 / 👎.

@H4vC H4vC force-pushed the h4vc/max-inflight-provider branch from 4066721 to b1deca6 Compare June 22, 2026 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli CLI commands and arguments feat providers LLM provider-specific issues review:p2 sdk SDK and extension integration surface triaged vouched Passed the vouch gate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants