Skip to content

fix(catalog): restore Umans GLM-5.2 max reasoning#3193

Open
roboomp wants to merge 5 commits into
mainfrom
farm/767d8b38/fix-umans-glm-max-reasoning
Open

fix(catalog): restore Umans GLM-5.2 max reasoning#3193
roboomp wants to merge 5 commits into
mainfrom
farm/767d8b38/fix-umans-glm-max-reasoning

Conversation

@roboomp

@roboomp roboomp commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

Repro

Feeding Umans discovery metadata for umans-glm-5.2 with levels: ["none","high","max"] through umansModelManagerOptions() reproduced the regression: before the fix the model thinking metadata was {"mode":"budget","efforts":["high"],"defaultLevel":"high"}, so the picker could only show the surviving high tier plus global selectors.

Cause

packages/catalog/src/provider-models/openai-compat.ts mapped Umans reasoning levels through UMANS_REASONING_EFFORT_BY_LEVEL, which did not recognize the upstream "max" level, and Umans GLM-5.2 had no baked thinking.effortMap to send Effort.XHigh as the provider wire value "max". packages/catalog/src/model-thinking.ts also re-baked bundled catalog thinking from model identity without the Umans GLM-5.2 high/max constraint, so regenerated fallback metadata lost the provider-specific wire map.

Fix

  • Map Umans "max" to Effort.XHigh and attach { xhigh: "max" } when dynamic discovery reports a max level.
  • Teach the shared thinking resolver that Umans GLM-5.2 exposes only high/xhigh and uses the xhigh -> max wire mapping, keeping regenerated models.json consistent.
  • Add Umans resolver coverage for the dynamic high/max ladder, wire mapping, and bundled catalog fallback.

Verification

Ran bun --cwd=packages/catalog test test/umans-provider.test.ts, bun --cwd=packages/catalog test test/model-thinking.test.ts test/generated-policies.test.ts, bun --cwd=packages/catalog test test/umans-provider.test.ts test/model-thinking.test.ts test/generated-policies.test.ts && bun --cwd=packages/catalog run check:types, and a one-off repro command verifying the fixed output is {"thinking":{"mode":"budget","efforts":["high","xhigh"],"effortMap":{"xhigh":"max"},"defaultLevel":"high"},"xhighWire":"max"}. Fixes #3192

Mapped Umans GLM-5.2's upstream max reasoning level to the internal xhigh effort and preserved the max wire value in dynamic discovery and bundled catalog metadata.

Added resolver coverage for the high/max ladder and verified the xhigh request maps back to max.

Fixes #3192

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a888c0f59

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

const efforts = mapUmansReasoningEfforts(value);
const thinking: ThinkingConfig = { mode: "budget", efforts };
if (umansHasMaxReasoningLevel(value) && efforts.includes(Effort.XHigh)) {
thinking.effortMap = UMANS_MAX_REASONING_EFFORT_MAP;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit the Umans max tier on requests

When Umans discovery reports levels: ["none", "high", "max"], this stores the xhigh -> max map on a mode: "budget" model, but Anthropic request shaping only turns effortMap into output_config.effort for anthropic-adaptive or anthropic-budget-effort modes (streamAnthropic otherwise just sends thinking.budget_tokens). As a result, selecting xhigh for dynamically discovered Umans GLM-5.2 still never sends the provider wire value "max"; the same issue applies to the bundled entry generated from inferDetectedEffortMap.

Useful? React with 👍 / 👎.

Removed the xhigh -> max effortMap baked onto the Umans GLM-5.2 spec; mode="budget" routes thinking depth via thinking.budget_tokens and never consults effortMap, so the wire shape is unchanged. The picker still surfaces high and xhigh via getModelDefinedEfforts, and the dynamic discovery still maps the upstream max level to Effort.XHigh.

Refs #3192
@roboomp

roboomp commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author

Fair catch — mode: "budget" only sends thinking.budget_tokens, so the baked xhigh -> max was dead. Dropped the effortMap from mapUmansThinkingConfig and inferDetectedEffortMap (and from the regenerated umans-glm-5.2 entry). Picker still resolves max -> Effort.XHigh via UMANS_REASONING_EFFORT_BY_LEVEL and getModelDefinedEfforts still caps the offered tiers to high/xhigh. Wire shape is unchanged from the previous working bundle.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e01d57d09

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

medium: Effort.Medium,
high: Effort.High,
xhigh: Effort.XHigh,
max: Effort.XHigh,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit Umans max effort on the wire

When Umans reports levels: ["none", "high", "max"], this maps max to the UI's xhigh, but the resulting model still uses thinking.mode: "budget" and the new test asserts there is no effortMap. Fresh evidence in this revision is that xhigh is now surfaced while effortMap is deliberately absent; for anthropic-messages budget mode, mapOptionsForApi only turns xhigh into ANTHROPIC_THINKING.xhigh budget tokens, and buildParams only emits output_config.effort for anthropic-adaptive or anthropic-budget-effort, so selecting the newly visible tier still never sends Umans the advertised wire level "max".

Useful? React with 👍 / 👎.

Forward anthropic-budget-effort selections through mapOptionsForApi so buildParams serializes output_config.effort with budget-token thinking. Mark Umans GLM-5.2 as budget-effort and map the UI xhigh tier back to Umans's max wire value.

Refs #3192
@roboomp

roboomp commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author
  • Restored Umans GLM-5.2 thinking.mode: "anthropic-budget-effort" with effortMap: { xhigh: "max" } for both dynamic discovery and the bundled catalog entry.
  • Updated streamSimple's Anthropic mapping so anthropic-budget-effort selections are forwarded as options.effort, letting buildParams emit output_config.effort beside thinking.budget_tokens.
  • Added coverage proving levels: ["none", "high", "max"] surfaces high/xhigh, maps xhigh to max, and serializes an Umans XHigh request as output_config: { effort: "max" }.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e277f3a310

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/ai/CHANGELOG.md Outdated

### Fixed

- Fixed Anthropic-compatible budget-effort models dropping the selected effort before request serialization, so `output_config.effort` is emitted alongside `thinking.budget_tokens` when model metadata declares `mode: "anthropic-budget-effort"`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Move AI changelog entry to Unreleased

/workspace/oh-my-pi/AGENTS.md says new changelog entries must go under ## [Unreleased] and released sections are immutable; this adds the new AI-package fix under the already-released 16.1.10 section. In the next release this note can be missed while also rewriting historical release notes, so please move it into an Unreleased ### Fixed section.

Useful? React with 👍 / 👎.

@@ -1,4 +1,5 @@
import { describe, expect, it } from "bun:test";
import { Effort } from "@oh-my-pi/pi-catalog";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Import Effort from its catalog submodule

/workspace/oh-my-pi/AGENTS.md requires catalog values to come from @oh-my-pi/pi-catalog/<module>; this root-barrel import evaluates src/index.ts and pulls unrelated catalog modules just to access Effort. In package-local tests that adds avoidable dependencies/cycles before the Umans provider code runs, so use @oh-my-pi/pi-catalog/effort instead.

Useful? React with 👍 / 👎.

Updated the Umans provider regression test to import Effort from the catalog effort submodule instead of the root catalog barrel, keeping the package-local test on the narrow dependency path.

Refs #3192
@roboomp

roboomp commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator Author
  • Updated packages/catalog/test/umans-provider.test.ts to import Effort from @oh-my-pi/pi-catalog/effort, avoiding the root catalog barrel before the Umans provider test runs.

@can1357

can1357 commented Jun 22, 2026

Copy link
Copy Markdown
Owner

I really don't want to do this.
We really dont need 8 thinking levels, and I've been intentionally keeping it out.

Simple policy atm is xhi == maximum.

Moved the AI package changelog entry for Anthropic budget-effort serialization into the Unreleased section so the already-released 16.1.10 notes stay immutable.

Refs #3192
@roboomp

roboomp commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author
  • Kept the thinking policy as xhigh == maximum: no user-facing max tier is added; Umans GLM-5.2 exposes only high/xhigh, with xhigh serialized to Umans’s wire value max.
  • Moved the AI-package changelog note from released 16.1.10 into Unreleased, leaving historical release notes immutable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

vouched Passed the vouch gate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Umans GLM-5.2 lost the max reasoning tier (picker shows only off/inherit/high/auto)

2 participants