Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## Title
Safe Executor v1 + MCP utilization standard (internal rollout)

## Summary
- Describe what changed and why.

## Required references
- Checklist: `operations/rollout/INTEGRATION_PR_CHECKLIST.md`
- PR body helper: `operations/rollout/PR_BODY_INTERNAL_ROLLOUT.md`

## Validation
- [ ] `npm test` passed
- [ ] Security defaults confirmed
- [ ] Resource/tool parity confirmed
- [ ] Environment-specific notes documented (if any)

## Rollout scope
- [ ] Internal opt-in only for this phase
26 changes: 26 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# DesktopCommanderMCP Repo Instructions

## Scope
These instructions apply to work in `/Users/test1/DesktopCommanderMCP`.

## Operating Standard
- Follow `/Users/test1/DesktopCommanderMCP/THREAD_STANDARD.md` for all implementation threads.
- Keep `/Users/test1/DesktopCommanderMCP/THREAD_REVIEW.md` updated at closeout.
- Treat `/Users/test1/DesktopCommanderMCP/PROGRAM_GOVERNANCE.md` as the program checklist.
- For internal Safe Executor rollout work, also follow `/Users/test1/DesktopCommanderMCP/operations/rollout/README.md`.

## Safety Bar (Non-Negotiable)
- Preserve secure-by-default behavior when feature flags are off.
- Keep tool schemas strict for risky tools and skill tooling.
- Require explicit approvals for execution paths (`run_skill(mode=execute)` via confirm flow).
- Keep command validation fail-closed in strict mode.
- Do not log raw sensitive payloads; default to redacted/metadata logging.

## Skills Layer
- Skills must be scoped, allowlisted, and reason-coded on failure.
- Prefer deterministic scripts for repeatable operations.
- New read-only “status views” should use MCP resources; mutations must remain tools.

## Source Policy
- OpenAI product decisions should be grounded in official OpenAI docs.
- MCP protocol/security decisions should be grounded in `modelcontextprotocol.io` documentation.
76 changes: 76 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# DesktopCommanderMCP Architecture (High Level)

This document describes the runtime shape of `DesktopCommanderMCP` and the core request lifecycle.

## What This Is

DesktopCommanderMCP is an MCP server that exposes tools for:
- Terminal/process execution
- Filesystem read/write/edit/search
- Skills orchestration (optional, behind config gates)
- Tool-call history and basic telemetry

Key entrypoint and wiring:
- `src/server.ts` (MCP server, request handlers, tool registry/filtering, guardrails, dispatch)
- `src/index.ts` / `dist/index.js` (runtime entrypoint; starts the MCP server over stdio)
- `server.yaml` (deployment/config surface: allowed directories, network toggles, timeouts)

## Components (Mapped to Real Code)

### MCP Server ("Town Hall")
- Constructed in `src/server.ts` via `new Server(...)`.
- Owns request handlers for MCP methods like `tools/list`, `tools/call`, resources, and prompts.

### Guardrails + Config ("Gatekeeper")
- `preExecutionGuardrail(toolName, args)` in `src/server.ts` blocks certain operations before dispatch.
- `server.yaml` exposes operator-facing config:
- `ALLOWED_DIRECTORIES` (filesystem allowlist)
- `DISABLE_NETWORK` and `NETWORK_TIMEOUT` (outbound network policy for containerized runs)

### Tool Registry + Filtering ("Toy Catalog")
- Tools are built as an in-memory array in `src/server.ts` in the `tools/list` handler.
- `shouldIncludeTool(toolName, skillsEnabled)` filters tools based on:
- `currentClient` (e.g., hide feedback tool for desktop-commander client)
- config `skillsEnabled` (hide skill tools unless explicitly enabled)

### Tool Dispatch ("Helpful Hands")
- `tools/call` handler in `src/server.ts`:
1. Captures telemetry metadata (including optional `_meta.clientInfo`).
2. Runs `preExecutionGuardrail(...)`.
3. Dispatches to the correct handler (mostly in `src/handlers/*`).

### Tool-Call History ("Scrapbook")
- `src/utils/toolHistory.ts` exports `toolHistory`.
- `src/server.ts` appends tool calls via `toolHistory.addCall(name, args, result, duration)`,
excluding `get_recent_tool_calls` and `track_ui_event` to avoid recursion/noise.

### Deferred Startup Logs ("Mail Carrier")
- `src/server.ts` buffers startup messages in `deferredMessages` and drains them via
`flushDeferredMessages()` after initialization.
- `src/utils/toolHistory.ts` also uses a write queue (`writeQueue`) and periodic flush to
append history to disk asynchronously (`tool-history.jsonl`).

## Request Lifecycle (tools)

### `tools/list`
1. Read config (`configManager.getConfig()`).
2. Build the full tools array (schemas + descriptions + annotations).
3. Filter tools via `shouldIncludeTool(...)`.
4. Return `{ tools: [...] }`.

### `tools/call`
1. Capture client metadata (optional) from `_meta`.
2. Run `preExecutionGuardrail(name, args)`; if blocked, return an error with `_meta.reason_code`.
3. Dispatch to the corresponding handler (`handlers.handleX(...)` or inline).
4. Record tool-call history via `toolHistory.addCall(...)` (with exclusions).

## "Bedtime Story" Glossary (Precise Mapping)

If you want the story version to be mechanically accurate, these are the exact anchors:
- Town Hall: `new Server(...)` in `src/server.ts`
- Gatekeeper: `preExecutionGuardrail(...)` in `src/server.ts` + `ALLOWED_DIRECTORIES` / `DISABLE_NETWORK` in `server.yaml`
- Toy Catalog: `tools/list` handler + `shouldIncludeTool(...)` in `src/server.ts`
- Helpful Hands: `tools/call` dispatch in `src/server.ts` and handlers in `src/handlers/*`
- Mail Carrier: `flushDeferredMessages()` in `src/server.ts` and async write queue in `src/utils/toolHistory.ts`
- Scrapbook: `toolHistory.addCall(...)` in `src/utils/toolHistory.ts` (invoked from `src/server.ts`)

51 changes: 51 additions & 0 deletions MCP_UTILIZATION_STANDARD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# MCP Utilization Standard v1 (2026-02-14)

## Purpose
Define a deterministic, security-first way to use MCP servers in this program so thread outcomes are repeatable and auditable.

## 1. Control-Plane Decision
- Use a single control plane: `/Users/test1/DesktopCommanderMCP`.
- Do not create a new MCP codebase by default.
- Revisit server split only after sustained divergence (>1 release cycle) or hard trust/runtime boundaries.

## 2. Routing Matrix
- `desktop-commander`: local execution, file/process/search tools, skill lifecycle tools, eval-gate operations.
- `figma`: design context extraction and implementation fidelity inputs.
- `playwright`: browser validation, UI interaction checks, capture/debug flows.
- `notion`: planning knowledge capture, meeting/research documentation.
- `linear`: issue tracking and implementation workflow status.
- `openaiDeveloperDocs`: official OpenAI API/Codex/Agents documentation lookup.

## 3. Resource-vs-Tool Policy
- Use MCP resources for read-only context state.
- Use tools for mutation/execution.
- Skill execution remains tool-driven (`run_skill`, `approve_skill_run`, `cancel_skill_run`).

## 4. Source Policy (Required)
- Use official documentation first for architecture/security decisions.
- OpenAI decisions: prefer `developers.openai.com` and `platform.openai.com`.
- MCP decisions: prefer `modelcontextprotocol.io`.
- If fallback browsing is required, restrict to official domains and cite concrete URLs.

## 5. OpenAI Docs MCP Verification
Run these checks when enabling or modifying OpenAI docs integration:
1. Connectivity check:
- confirm `openaiDeveloperDocs` exists in `/Users/test1/.codex/config.toml`.
2. Sanity query:
- run one documentation search and confirm at least one result is returned.
3. Fallback policy:
- if MCP docs server is unavailable, log fallback reason in-thread and still cite official OpenAI URLs.

## 6. Thread Preflight (Required)
Before implementation, capture:
- date stamp (absolute date),
- objective and non-goals,
- risk class (`low|medium|high`),
- runtime controls: `approval_policy`, `sandbox_mode`, `network_access`,
- active MCP servers in scope for that thread.

## 7. Rollout Policy
- R1: standards and docs-server integration only.
- R2: enable read-only resources in opt-in environments.
- R3: apply operations skill in active threads.
- R4+: evaluate split only if split criteria persist.
29 changes: 29 additions & 0 deletions PROGRAM_GOVERNANCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Program Governance: Safe Executor Stabilization

## Scope
This repository is the source of truth for Safe Executor stabilization work.

## Required Thread Artifacts
Every implementation thread must include:
- `THREAD_STANDARD.md` as the operating baseline.
- `THREAD_REVIEW.md` updated at closeout with validation and residual risks.

## Tracking Labels
Use these labels for issues and milestones:
- `executor-hardening`
- `eval-gate`
- `security-p0`
- `rollout-optin`

## Closeout Requirements
A thread is considered complete only when:
- acceptance criteria are mapped to code + tests,
- security defaults are preserved when feature flags are off,
- residual risks and next gate are documented in `THREAD_REVIEW.md`.

## Internal Rollout Operations (Q1 2026)
During internal Safe Executor rollout, teams must also follow:
- `/Users/test1/DesktopCommanderMCP/operations/rollout/README.md`
- `/Users/test1/DesktopCommanderMCP/operations/rollout/INTEGRATION_PR_CHECKLIST.md`
- `/Users/test1/DesktopCommanderMCP/operations/rollout/PILOT_WORKFLOWS.md`
- `/Users/test1/DesktopCommanderMCP/operations/rollout/WEEKLY_OPERATIONS_CHECKLIST.md`
80 changes: 80 additions & 0 deletions THREAD_REVIEW.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Thread Review (2026-02-14)

## Primary Task
Install and configure Desktop Commander MCP, then implement a security-first skills upgrade plan (feature-flagged), including Safe Executor v1 with approval flow and guarded execution.

## Options Reviewed and Selected
- Plan-only maturity: lowest risk, but limited execution value.
- Safe Executor v1: selected balance of safety and delivery value.
- Workflow DSL engine: deferred due to scope/risk.

## What Has Been Achieved

### 1. Installation and MCP setup
- Desktop Commander MCP integrated into Codex configuration (`desktop-commander` via `npx -y @wonderwhy-er/desktop-commander@latest`).

### 2. Security hardening
- Telemetry refactored to env-driven config in `/Users/test1/DesktopCommanderMCP/src/utils/capture.ts`.
- Tool-call logging hardened in `/Users/test1/DesktopCommanderMCP/src/utils/trackTools.ts` with `off | metadata | redacted` behavior.
- Fail-closed strict command validation added in `/Users/test1/DesktopCommanderMCP/src/command-manager.ts` with legacy fallback behind mode.
- Server-side safety checks added in `/Users/test1/DesktopCommanderMCP/src/server.ts` for risky paths.

### 3. Skill registry and tooling
- Skill parser/registry/runner modules added under `/Users/test1/DesktopCommanderMCP/src/skills/`.
- Skill handlers added in `/Users/test1/DesktopCommanderMCP/src/handlers/skills-handlers.ts` and wired through `/Users/test1/DesktopCommanderMCP/src/handlers/index.ts`.
- Tool schemas and server registration added for:
- `list_skills`
- `get_skill`
- `run_skill`
- `get_skill_run`
- `cancel_skill_run`
- `approve_skill_run`
- Skill tools are hidden from tool listing when `skillsEnabled !== true`.

### 4. Safe Executor v1 behavior
- Runner now separates planner, executor, and verifier in `/Users/test1/DesktopCommanderMCP/src/skills/runner.ts`.
- Execution model supports guarded step types: `read`, `search`, `script`, `command_safe`.
- Confirm flow implemented:
- `run_skill(mode=execute)` can transition to `waiting_approval`.
- `approve_skill_run(runId)` transitions execution to completion/failure.
- Run responses now include `requiresApproval`, `nextAction`, and `executionSummary`.

### 5. Validation status
- Build passed.
- Added/ran tests for security, telemetry, runner behavior, tool visibility, and skill workflows:
- `/Users/test1/DesktopCommanderMCP/test/test-security-upgrades.js`
- `/Users/test1/DesktopCommanderMCP/test/test-telemetry-secrets.js`
- `/Users/test1/DesktopCommanderMCP/test/test-skill-runner-unit.js`
- `/Users/test1/DesktopCommanderMCP/test/test-skill-tools-visibility.js`
- `/Users/test1/DesktopCommanderMCP/test/test-skills-workflow.js`
- Existing blocked-command security tests also passed.

## Residual Gap
- Runtime eval gate now exists with configurable thresholds:
- `skillExecuteEvalGateEnabled`
- `skillExecuteMinPassRate`
- `skillExecuteMinSampleSize`
- Execute paths (`run_skill(mode=execute)`, `approve_skill_run`) now fail closed when gate conditions are not met.
- Remaining rollout work is operational (policy/enablement), not core runtime implementation.

## Standardization Output
- Reusable thread standard added: `/Users/test1/DesktopCommanderMCP/THREAD_STANDARD.md`.
- Program governance checklist added: `/Users/test1/DesktopCommanderMCP/PROGRAM_GOVERNANCE.md`.

## 2026-02-23 Utilization Rollout Implementation
- Added internal rollout operations package under `/Users/test1/DesktopCommanderMCP/operations/rollout/`.
- Captured dated baseline artifacts in `/Users/test1/DesktopCommanderMCP/operations/rollout/2026-02-23/`.
- Added integration PR checklist and template in `/Users/test1/DesktopCommanderMCP/operations/rollout/INTEGRATION_PR_CHECKLIST.md`.
- Added pilot definitions in `/Users/test1/DesktopCommanderMCP/operations/rollout/PILOT_WORKFLOWS.md`.
- Added weekly cadence checks in `/Users/test1/DesktopCommanderMCP/operations/rollout/WEEKLY_OPERATIONS_CHECKLIST.md`.
- Linked rollout operations as required governance references in `/Users/test1/DesktopCommanderMCP/PROGRAM_GOVERNANCE.md`.
- Added thread preflight/closeout template in `/Users/test1/DesktopCommanderMCP/operations/rollout/THREAD_PREVIEW_TEMPLATE.md`.
- Captured pilot run evidence and summaries:
- `/Users/test1/DesktopCommanderMCP/operations/rollout/2026-02-23/pilot_run_report.json`
- `/Users/test1/DesktopCommanderMCP/operations/rollout/2026-02-23/pilot_run_summary.md`
- Captured test validation evidence:
- `/Users/test1/DesktopCommanderMCP/operations/rollout/2026-02-23/npm_test_summary.md`
- `/Users/test1/DesktopCommanderMCP/operations/rollout/2026-02-23/npm_test.log`
- Recorded eval-gate check and decision logs:
- `/Users/test1/DesktopCommanderMCP/operations/rollout/EVAL_GATE_CHECKS_2026Q1.md`
- `/Users/test1/DesktopCommanderMCP/operations/rollout/ROLLOUT_DECISION_LOG_2026Q1.md`
90 changes: 90 additions & 0 deletions THREAD_STANDARD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Thread Standard v1 (2026-02-14)

## Purpose
Use this standard for implementation threads so work is reproducible, auditable, and safe by default.
This standard is operationalized in `/Users/test1/DesktopCommanderMCP/PROGRAM_GOVERNANCE.md`.

## 1. Thread Intake (required)
Capture these before implementation starts:
- Date stamp (absolute date).
- Primary objective and non-goals.
- Explicit acceptance criteria.
- Risk class: `low | medium | high`.
- Security posture required (`approval_policy`, `sandbox_mode`, network expectations).
- Runtime controls: `approval_policy`, `sandbox_mode`, `network_access`.
- Active MCP servers in scope for the thread.
- Time references using absolute dates.

## 2. Instruction Layering (required)
Follow Codex instruction precedence and keep instructions local to scope:
- Global instructions in `~/.codex/AGENTS.md`.
- Repo instructions in `AGENTS.md` at repo root.
- Narrow overrides via `AGENTS.override.md` only for subtrees that need different rules.
- Verify active instruction chain when needed.

## 3. Security Baseline (required)
Default baseline for development and agentic execution:
- Prefer `sandbox_mode = "workspace-write"` with approvals.
- Prefer `approval_policy = "untrusted"` or `"on-request"`.
- Keep `network_access = false` unless a reviewed need exists.
- Do not use `danger-full-access` except in isolated, controlled environments.
- Require explicit approval before mutating operations in risky contexts.

## 4. Architecture Selection (required)
Choose the minimum orchestration needed:
- Start with one agent and clear tool boundaries.
- Add multi-agent routing only when tasks are clearly separable or instruction/tool complexity is too high.
- Keep human-in-the-loop checkpoints for consequential actions.

## 5. Tool and Skill Contract Standard (required)
For new tools/skills:
- Keep tool schemas strict (`additionalProperties: false`, strict validation).
- Enforce allowlists and scoped paths for execution primitives.
- Hide feature-flagged tools when disabled.
- Prefer deterministic scripts for repeated operations.
- Return structured, actionable errors with reason codes.

## 6. Execution Lifecycle (required)
Use explicit run states for agentic operations:
- `queued -> planning -> waiting_approval -> executing -> verifying -> completed|failed|canceled`.
- `plan` mode must be deterministic and side-effect free.
- `execute` mode must enforce approval and safety guards.
- `verify` must run before `completed` can be set.

## 7. Evals and Rollout Gates (required)
Adopt eval-driven delivery:
- Add scoped unit/integration/security tests with each phase.
- Add golden scenarios for core workflows.
- Add adversarial and bypass tests for guardrails.
- Gate rollout by measured pass thresholds, not intuition.

## 8. Observability and Privacy (required)
- Telemetry/logging must be opt-in and environment-driven.
- Never store raw secrets or sensitive payloads in logs.
- Prefer metadata/redacted logging modes by default.
- Emit structured events for run lifecycle and safety blocks.

## 8.1 Source Policy (required)
- Prefer official docs for architecture and security decisions.
- OpenAI product guidance: cite `developers.openai.com` / `platform.openai.com`.
- MCP protocol guidance: cite `modelcontextprotocol.io`.
- If fallback browsing is required, restrict to official domains and cite concrete URLs.

## 9. Definition of Done (required)
A thread is done only when all are true:
- Acceptance criteria are mapped to code/tests.
- Build passes and relevant tests pass.
- Security defaults preserved when feature flags are off.
- Thread review document is updated with outcomes and residual risks.

## 10. Thread Closeout Template
Use this at thread end:
- Primary task.
- Options considered and selected path.
- What changed (files/tools/config).
- Validation run (build/tests/evals).
- What remains (if anything) with explicit next gate.

## 11. Operational Template (Q1 2026 rollout)
For internal Safe Executor rollout threads, use:
- `/Users/test1/DesktopCommanderMCP/operations/rollout/THREAD_PREVIEW_TEMPLATE.md`
Loading