wonderwhy-er · fvn946zh9w-crypto · Feb 23, 2026
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -0,0 +1,18 @@
+## Title
+Safe Executor v1 + MCP utilization standard (internal rollout)
+
+## Summary
+- Describe what changed and why.
+
+## Required references
+- Checklist: `operations/rollout/INTEGRATION_PR_CHECKLIST.md`
+- PR body helper: `operations/rollout/PR_BODY_INTERNAL_ROLLOUT.md`
+
+## Validation
+- [ ] `npm test` passed
+- [ ] Security defaults confirmed
+- [ ] Resource/tool parity confirmed
+- [ ] Environment-specific notes documented (if any)
+
+## Rollout scope
+- [ ] Internal opt-in only for this phase
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,26 @@
+# DesktopCommanderMCP Repo Instructions
+
+## Scope
+These instructions apply to work in `/Users/test1/DesktopCommanderMCP`.
+
+## Operating Standard
+- Follow `/Users/test1/DesktopCommanderMCP/THREAD_STANDARD.md` for all implementation threads.
+- Keep `/Users/test1/DesktopCommanderMCP/THREAD_REVIEW.md` updated at closeout.
+- Treat `/Users/test1/DesktopCommanderMCP/PROGRAM_GOVERNANCE.md` as the program checklist.
+- For internal Safe Executor rollout work, also follow `/Users/test1/DesktopCommanderMCP/operations/rollout/README.md`.
+
+## Safety Bar (Non-Negotiable)
+- Preserve secure-by-default behavior when feature flags are off.
+- Keep tool schemas strict for risky tools and skill tooling.
+- Require explicit approvals for execution paths (`run_skill(mode=execute)` via confirm flow).
+- Keep command validation fail-closed in strict mode.
+- Do not log raw sensitive payloads; default to redacted/metadata logging.
+
+## Skills Layer
+- Skills must be scoped, allowlisted, and reason-coded on failure.
+- Prefer deterministic scripts for repeatable operations.
+- New read-only “status views” should use MCP resources; mutations must remain tools.
+
+## Source Policy
+- OpenAI product decisions should be grounded in official OpenAI docs.
+- MCP protocol/security decisions should be grounded in `modelcontextprotocol.io` documentation.
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -0,0 +1,76 @@
+# DesktopCommanderMCP Architecture (High Level)
+
+This document describes the runtime shape of `DesktopCommanderMCP` and the core request lifecycle.
+
+## What This Is
+
+DesktopCommanderMCP is an MCP server that exposes tools for:
+- Terminal/process execution
+- Filesystem read/write/edit/search
+- Skills orchestration (optional, behind config gates)
+- Tool-call history and basic telemetry
+
+Key entrypoint and wiring:
+- `src/server.ts` (MCP server, request handlers, tool registry/filtering, guardrails, dispatch)
+- `src/index.ts` / `dist/index.js` (runtime entrypoint; starts the MCP server over stdio)
+- `server.yaml` (deployment/config surface: allowed directories, network toggles, timeouts)
+
+## Components (Mapped to Real Code)
+
+### MCP Server ("Town Hall")
+- Constructed in `src/server.ts` via `new Server(...)`.
+- Owns request handlers for MCP methods like `tools/list`, `tools/call`, resources, and prompts.
+
+### Guardrails + Config ("Gatekeeper")
+- `preExecutionGuardrail(toolName, args)` in `src/server.ts` blocks certain operations before dispatch.
+- `server.yaml` exposes operator-facing config:
+  - `ALLOWED_DIRECTORIES` (filesystem allowlist)
+  - `DISABLE_NETWORK` and `NETWORK_TIMEOUT` (outbound network policy for containerized runs)
+
+### Tool Registry + Filtering ("Toy Catalog")
+- Tools are built as an in-memory array in `src/server.ts` in the `tools/list` handler.
+- `shouldIncludeTool(toolName, skillsEnabled)` filters tools based on:
+  - `currentClient` (e.g., hide feedback tool for desktop-commander client)
+  - config `skillsEnabled` (hide skill tools unless explicitly enabled)
+
+### Tool Dispatch ("Helpful Hands")
+- `tools/call` handler in `src/server.ts`:
+  1. Captures telemetry metadata (including optional `_meta.clientInfo`).
+  2. Runs `preExecutionGuardrail(...)`.
+  3. Dispatches to the correct handler (mostly in `src/handlers/*`).
+
+### Tool-Call History ("Scrapbook")
+- `src/utils/toolHistory.ts` exports `toolHistory`.
+- `src/server.ts` appends tool calls via `toolHistory.addCall(name, args, result, duration)`,
+  excluding `get_recent_tool_calls` and `track_ui_event` to avoid recursion/noise.
+
+### Deferred Startup Logs ("Mail Carrier")
+- `src/server.ts` buffers startup messages in `deferredMessages` and drains them via
+  `flushDeferredMessages()` after initialization.
+- `src/utils/toolHistory.ts` also uses a write queue (`writeQueue`) and periodic flush to
+  append history to disk asynchronously (`tool-history.jsonl`).
+
+## Request Lifecycle (tools)
+
+### `tools/list`
+1. Read config (`configManager.getConfig()`).
+2. Build the full tools array (schemas + descriptions + annotations).
+3. Filter tools via `shouldIncludeTool(...)`.
+4. Return `{ tools: [...] }`.
+
+### `tools/call`
+1. Capture client metadata (optional) from `_meta`.
+2. Run `preExecutionGuardrail(name, args)`; if blocked, return an error with `_meta.reason_code`.
+3. Dispatch to the corresponding handler (`handlers.handleX(...)` or inline).
+4. Record tool-call history via `toolHistory.addCall(...)` (with exclusions).
+
+## "Bedtime Story" Glossary (Precise Mapping)
+
+If you want the story version to be mechanically accurate, these are the exact anchors:
+- Town Hall: `new Server(...)` in `src/server.ts`
+- Gatekeeper: `preExecutionGuardrail(...)` in `src/server.ts` + `ALLOWED_DIRECTORIES` / `DISABLE_NETWORK` in `server.yaml`
+- Toy Catalog: `tools/list` handler + `shouldIncludeTool(...)` in `src/server.ts`
+- Helpful Hands: `tools/call` dispatch in `src/server.ts` and handlers in `src/handlers/*`
+- Mail Carrier: `flushDeferredMessages()` in `src/server.ts` and async write queue in `src/utils/toolHistory.ts`
+- Scrapbook: `toolHistory.addCall(...)` in `src/utils/toolHistory.ts` (invoked from `src/server.ts`)
+
diff --git a/MCP_UTILIZATION_STANDARD.md b/MCP_UTILIZATION_STANDARD.md
@@ -0,0 +1,51 @@
+# MCP Utilization Standard v1 (2026-02-14)
+
+## Purpose
+Define a deterministic, security-first way to use MCP servers in this program so thread outcomes are repeatable and auditable.
+
+## 1. Control-Plane Decision
+- Use a single control plane: `/Users/test1/DesktopCommanderMCP`.
+- Do not create a new MCP codebase by default.
+- Revisit server split only after sustained divergence (>1 release cycle) or hard trust/runtime boundaries.
+
+## 2. Routing Matrix
+- `desktop-commander`: local execution, file/process/search tools, skill lifecycle tools, eval-gate operations.
+- `figma`: design context extraction and implementation fidelity inputs.
+- `playwright`: browser validation, UI interaction checks, capture/debug flows.
+- `notion`: planning knowledge capture, meeting/research documentation.
+- `linear`: issue tracking and implementation workflow status.
+- `openaiDeveloperDocs`: official OpenAI API/Codex/Agents documentation lookup.
+
+## 3. Resource-vs-Tool Policy
+- Use MCP resources for read-only context state.
+- Use tools for mutation/execution.
+- Skill execution remains tool-driven (`run_skill`, `approve_skill_run`, `cancel_skill_run`).
+
+## 4. Source Policy (Required)
+- Use official documentation first for architecture/security decisions.
+- OpenAI decisions: prefer `developers.openai.com` and `platform.openai.com`.
+- MCP decisions: prefer `modelcontextprotocol.io`.
+- If fallback browsing is required, restrict to official domains and cite concrete URLs.
+
+## 5. OpenAI Docs MCP Verification
+Run these checks when enabling or modifying OpenAI docs integration:
+1. Connectivity check:
+- confirm `openaiDeveloperDocs` exists in `/Users/test1/.codex/config.toml`.
+2. Sanity query:
+- run one documentation search and confirm at least one result is returned.
+3. Fallback policy:
+- if MCP docs server is unavailable, log fallback reason in-thread and still cite official OpenAI URLs.
+
+## 6. Thread Preflight (Required)
+Before implementation, capture:
+- date stamp (absolute date),
+- objective and non-goals,
+- risk class (`low|medium|high`),
+- runtime controls: `approval_policy`, `sandbox_mode`, `network_access`,
+- active MCP servers in scope for that thread.
+
+## 7. Rollout Policy
+- R1: standards and docs-server integration only.
+- R2: enable read-only resources in opt-in environments.
+- R3: apply operations skill in active threads.
+- R4+: evaluate split only if split criteria persist.
diff --git a/PROGRAM_GOVERNANCE.md b/PROGRAM_GOVERNANCE.md
@@ -0,0 +1,29 @@
+# Program Governance: Safe Executor Stabilization
+
+## Scope
+This repository is the source of truth for Safe Executor stabilization work.
+
+## Required Thread Artifacts
+Every implementation thread must include:
+- `THREAD_STANDARD.md` as the operating baseline.
+- `THREAD_REVIEW.md` updated at closeout with validation and residual risks.
+
+## Tracking Labels
+Use these labels for issues and milestones:
+- `executor-hardening`
+- `eval-gate`
+- `security-p0`
+- `rollout-optin`
+
+## Closeout Requirements
+A thread is considered complete only when:
+- acceptance criteria are mapped to code + tests,
+- security defaults are preserved when feature flags are off,
+- residual risks and next gate are documented in `THREAD_REVIEW.md`.
+
+## Internal Rollout Operations (Q1 2026)
+During internal Safe Executor rollout, teams must also follow:
+- `/Users/test1/DesktopCommanderMCP/operations/rollout/README.md`
+- `/Users/test1/DesktopCommanderMCP/operations/rollout/INTEGRATION_PR_CHECKLIST.md`
+- `/Users/test1/DesktopCommanderMCP/operations/rollout/PILOT_WORKFLOWS.md`
+- `/Users/test1/DesktopCommanderMCP/operations/rollout/WEEKLY_OPERATIONS_CHECKLIST.md`
diff --git a/THREAD_REVIEW.md b/THREAD_REVIEW.md
@@ -0,0 +1,80 @@
+# Thread Review (2026-02-14)
+
+## Primary Task
+Install and configure Desktop Commander MCP, then implement a security-first skills upgrade plan (feature-flagged), including Safe Executor v1 with approval flow and guarded execution.
+
+## Options Reviewed and Selected
+- Plan-only maturity: lowest risk, but limited execution value.
+- Safe Executor v1: selected balance of safety and delivery value.
+- Workflow DSL engine: deferred due to scope/risk.
+
+## What Has Been Achieved
+
+### 1. Installation and MCP setup
+- Desktop Commander MCP integrated into Codex configuration (`desktop-commander` via `npx -y @wonderwhy-er/desktop-commander@latest`).
+
+### 2. Security hardening
+- Telemetry refactored to env-driven config in `/Users/test1/DesktopCommanderMCP/src/utils/capture.ts`.
+- Tool-call logging hardened in `/Users/test1/DesktopCommanderMCP/src/utils/trackTools.ts` with `off | metadata | redacted` behavior.
+- Fail-closed strict command validation added in `/Users/test1/DesktopCommanderMCP/src/command-manager.ts` with legacy fallback behind mode.
+- Server-side safety checks added in `/Users/test1/DesktopCommanderMCP/src/server.ts` for risky paths.
+
+### 3. Skill registry and tooling
+- Skill parser/registry/runner modules added under `/Users/test1/DesktopCommanderMCP/src/skills/`.
+- Skill handlers added in `/Users/test1/DesktopCommanderMCP/src/handlers/skills-handlers.ts` and wired through `/Users/test1/DesktopCommanderMCP/src/handlers/index.ts`.
+- Tool schemas and server registration added for:
+  - `list_skills`
+  - `get_skill`
+  - `run_skill`
+  - `get_skill_run`
+  - `cancel_skill_run`
+  - `approve_skill_run`
+- Skill tools are hidden from tool listing when `skillsEnabled !== true`.
+
+### 4. Safe Executor v1 behavior
+- Runner now separates planner, executor, and verifier in `/Users/test1/DesktopCommanderMCP/src/skills/runner.ts`.
+- Execution model supports guarded step types: `read`, `search`, `script`, `command_safe`.
+- Confirm flow implemented:
+  - `run_skill(mode=execute)` can transition to `waiting_approval`.
+  - `approve_skill_run(runId)` transitions execution to completion/failure.
+- Run responses now include `requiresApproval`, `nextAction`, and `executionSummary`.
+
+### 5. Validation status
+- Build passed.
+- Added/ran tests for security, telemetry, runner behavior, tool visibility, and skill workflows:
+  - `/Users/test1/DesktopCommanderMCP/test/test-security-upgrades.js`
+  - `/Users/test1/DesktopCommanderMCP/test/test-telemetry-secrets.js`
+  - `/Users/test1/DesktopCommanderMCP/test/test-skill-runner-unit.js`
+  - `/Users/test1/DesktopCommanderMCP/test/test-skill-tools-visibility.js`
+  - `/Users/test1/DesktopCommanderMCP/test/test-skills-workflow.js`
+- Existing blocked-command security tests also passed.
+
+## Residual Gap
+- Runtime eval gate now exists with configurable thresholds:
+  - `skillExecuteEvalGateEnabled`
+  - `skillExecuteMinPassRate`
+  - `skillExecuteMinSampleSize`
+- Execute paths (`run_skill(mode=execute)`, `approve_skill_run`) now fail closed when gate conditions are not met.
+- Remaining rollout work is operational (policy/enablement), not core runtime implementation.
+
+## Standardization Output
+- Reusable thread standard added: `/Users/test1/DesktopCommanderMCP/THREAD_STANDARD.md`.
+- Program governance checklist added: `/Users/test1/DesktopCommanderMCP/PROGRAM_GOVERNANCE.md`.
+
+## 2026-02-23 Utilization Rollout Implementation
+- Added internal rollout operations package under `/Users/test1/DesktopCommanderMCP/operations/rollout/`.
+- Captured dated baseline artifacts in `/Users/test1/DesktopCommanderMCP/operations/rollout/2026-02-23/`.
+- Added integration PR checklist and template in `/Users/test1/DesktopCommanderMCP/operations/rollout/INTEGRATION_PR_CHECKLIST.md`.
+- Added pilot definitions in `/Users/test1/DesktopCommanderMCP/operations/rollout/PILOT_WORKFLOWS.md`.
+- Added weekly cadence checks in `/Users/test1/DesktopCommanderMCP/operations/rollout/WEEKLY_OPERATIONS_CHECKLIST.md`.
+- Linked rollout operations as required governance references in `/Users/test1/DesktopCommanderMCP/PROGRAM_GOVERNANCE.md`.
+- Added thread preflight/closeout template in `/Users/test1/DesktopCommanderMCP/operations/rollout/THREAD_PREVIEW_TEMPLATE.md`.
+- Captured pilot run evidence and summaries:
+  - `/Users/test1/DesktopCommanderMCP/operations/rollout/2026-02-23/pilot_run_report.json`
+  - `/Users/test1/DesktopCommanderMCP/operations/rollout/2026-02-23/pilot_run_summary.md`
+- Captured test validation evidence:
+  - `/Users/test1/DesktopCommanderMCP/operations/rollout/2026-02-23/npm_test_summary.md`
+  - `/Users/test1/DesktopCommanderMCP/operations/rollout/2026-02-23/npm_test.log`
+- Recorded eval-gate check and decision logs:
+  - `/Users/test1/DesktopCommanderMCP/operations/rollout/EVAL_GATE_CHECKS_2026Q1.md`
+  - `/Users/test1/DesktopCommanderMCP/operations/rollout/ROLLOUT_DECISION_LOG_2026Q1.md`
diff --git a/THREAD_STANDARD.md b/THREAD_STANDARD.md
@@ -0,0 +1,90 @@
+# Thread Standard v1 (2026-02-14)
+
+## Purpose
+Use this standard for implementation threads so work is reproducible, auditable, and safe by default.
+This standard is operationalized in `/Users/test1/DesktopCommanderMCP/PROGRAM_GOVERNANCE.md`.
+
+## 1. Thread Intake (required)
+Capture these before implementation starts:
+- Date stamp (absolute date).
+- Primary objective and non-goals.
+- Explicit acceptance criteria.
+- Risk class: `low | medium | high`.
+- Security posture required (`approval_policy`, `sandbox_mode`, network expectations).
+- Runtime controls: `approval_policy`, `sandbox_mode`, `network_access`.
+- Active MCP servers in scope for the thread.
+- Time references using absolute dates.
+
+## 2. Instruction Layering (required)
+Follow Codex instruction precedence and keep instructions local to scope:
+- Global instructions in `~/.codex/AGENTS.md`.
+- Repo instructions in `AGENTS.md` at repo root.
+- Narrow overrides via `AGENTS.override.md` only for subtrees that need different rules.
+- Verify active instruction chain when needed.
+
+## 3. Security Baseline (required)
+Default baseline for development and agentic execution:
+- Prefer `sandbox_mode = "workspace-write"` with approvals.
+- Prefer `approval_policy = "untrusted"` or `"on-request"`.
+- Keep `network_access = false` unless a reviewed need exists.
+- Do not use `danger-full-access` except in isolated, controlled environments.
+- Require explicit approval before mutating operations in risky contexts.
+
+## 4. Architecture Selection (required)
+Choose the minimum orchestration needed:
+- Start with one agent and clear tool boundaries.
+- Add multi-agent routing only when tasks are clearly separable or instruction/tool complexity is too high.
+- Keep human-in-the-loop checkpoints for consequential actions.
+
+## 5. Tool and Skill Contract Standard (required)
+For new tools/skills:
+- Keep tool schemas strict (`additionalProperties: false`, strict validation).
+- Enforce allowlists and scoped paths for execution primitives.
+- Hide feature-flagged tools when disabled.
+- Prefer deterministic scripts for repeated operations.
+- Return structured, actionable errors with reason codes.
+
+## 6. Execution Lifecycle (required)
+Use explicit run states for agentic operations:
+- `queued -> planning -> waiting_approval -> executing -> verifying -> completed|failed|canceled`.
+- `plan` mode must be deterministic and side-effect free.
+- `execute` mode must enforce approval and safety guards.
+- `verify` must run before `completed` can be set.
+
+## 7. Evals and Rollout Gates (required)
+Adopt eval-driven delivery:
+- Add scoped unit/integration/security tests with each phase.
+- Add golden scenarios for core workflows.
+- Add adversarial and bypass tests for guardrails.
+- Gate rollout by measured pass thresholds, not intuition.
+
+## 8. Observability and Privacy (required)
+- Telemetry/logging must be opt-in and environment-driven.
+- Never store raw secrets or sensitive payloads in logs.
+- Prefer metadata/redacted logging modes by default.
+- Emit structured events for run lifecycle and safety blocks.
+
+## 8.1 Source Policy (required)
+- Prefer official docs for architecture and security decisions.
+- OpenAI product guidance: cite `developers.openai.com` / `platform.openai.com`.
+- MCP protocol guidance: cite `modelcontextprotocol.io`.
+- If fallback browsing is required, restrict to official domains and cite concrete URLs.
+
+## 9. Definition of Done (required)
+A thread is done only when all are true:
+- Acceptance criteria are mapped to code/tests.
+- Build passes and relevant tests pass.
+- Security defaults preserved when feature flags are off.
+- Thread review document is updated with outcomes and residual risks.
+
+## 10. Thread Closeout Template
+Use this at thread end:
+- Primary task.
+- Options considered and selected path.
+- What changed (files/tools/config).
+- Validation run (build/tests/evals).
+- What remains (if anything) with explicit next gate.
+
+## 11. Operational Template (Q1 2026 rollout)
+For internal Safe Executor rollout threads, use:
+- `/Users/test1/DesktopCommanderMCP/operations/rollout/THREAD_PREVIEW_TEMPLATE.md`