feat: task-relevant code summaries with Turso vector search#3185
feat: task-relevant code summaries with Turso vector search#3185oldschoola wants to merge 6 commits into
Conversation
Add a native (non-MCP) code summaries feature inspired by devalade/codemap:
agent-written file-level summaries persisted in Turso/libSQL with native
vector search, retrieved as minimal task-relevant context via hybrid
FTS5 + vector_top_k retrieval with reciprocal rank fusion and budget
packing.
Key design decisions (verified through adversarial workflowz design panel):
- Distinct feature module (codemap.* settings), NOT in memory.backend enum
- Composes with any memory backend including off (the default)
- Turso/libSQL-only storage with F32_BLOB vector columns + libsql_vector_idx
- FTS5 virtual table + triggers for lexical search (stable, not experimental)
- Hybrid retrieval: FTS5 + vector_top_k → reciprocal rank fusion (k=60)
- Budget packer with codemap's documented token formula: ceil(chars/4) + 20
- Singular PK + ROWID (required by libsql_vector_idx, not composite PK)
- Bun.hash for content staleness (per AGENTS.md convention)
- Lazy embedding on retrieval (not write) via decoupled MnemopiEmbedClient
- Automatic Turso DB provisioning with settings.set() persist-back
- Independent first-turn injection seam (not via memory backend hook)
- Pluggable LanguageAdapter interface (TsAdapter ships in v1 via LSP)
New module: packages/coding-agent/src/task-context/ (13 files, ~1400 lines)
- schema.ts: SQL DDL (summaries table, FTS5, vector index)
- config.ts: settings → typed config loader
- db.ts: libsql client factory + post-sync maintenance
- staleness.ts: Bun.hash content hash + staleness detection
- store.ts: CRUD + FTS + vector search data layer
- adapter.ts: pluggable language adapter + TsAdapter
- retrieve.ts: hybrid retrieval pipeline with RRF + budget packer
- embed.ts: decoupled embedding client (lazy, on retrieval)
- turso.ts: auto-provisioning + connection resolution
- tools.ts: 4 AgentTool classes with createIf gating
- prompt.ts: system-prompt injection helpers
- state.ts: per-session state via Symbol key
- index.ts: barrel re-exports + resolveCodemap/shutdownCodemap lifecycle
Integration edits (10 existing files):
- settings-schema.ts: 14 codemap.* settings
- settings-defs.ts: codemapActive condition
- builtin-names.ts: 4 tool names
- tools/index.ts: BUILTIN_TOOLS registration + isToolAllowed gating
- system-prompt.ts: codemapEnabled option + hasCodemap threading
- system-prompt.md: {{#if hasCodemap}} advertisement block
- agent-session.ts: #injectCodemapTaskContext + shutdownCodemap in dispose
- sdk.ts: resolveCodemap startup call + codemapEnabled option
- hindsight/content.ts: stripMemoryTags for <codemap> blocks
- package.json: @libsql/client dependency
Tests: 19 passing (staleness transitions, budget packer token math, RRF fusion)
Verification: bun check passes across all 16 packages (0 type errors, 0 lint errors)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e86e901326
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| queryEmbedding && queryEmbedding.length > 0 | ||
| ? searchVector(client, projectLabel, queryEmbedding, seedLimit).catch(() => [] as RankedSummary[]) | ||
| : Promise.resolve([] as RankedSummary[]); |
There was a problem hiding this comment.
Wire embeddings before vector retrieval
The user-facing callers added in this commit (GetTaskContextTool.execute and the first-turn injection) call getTaskContext without opts.queryEmbedding, and the new embedding helpers/backfill functions are not invoked anywhere, so this branch always resolves the vector side to an empty list. In codemap-enabled sessions this makes the advertised semantic/vector retrieval path unreachable; summaries are only found when the lexical FTS query happens to match the task terms exactly.
Useful? React with 👍 / 👎.
| const created = (await createResp.json()) as { Hostname: string }; | ||
| const syncUrl = `libsql://${created.Hostname}`; |
There was a problem hiding this comment.
Read Turso hostname from the nested response
When auto-provisioning runs, Turso's Create Database API returns the hostname under database.Hostname (see https://docs.turso.tech/api-reference/databases/create), not as a top-level Hostname. This cast therefore makes created.Hostname undefined and persists libsql://undefined as codemap.turso.syncUrl, so users with TURSO_API_TOKEN/org configured get an invalid remote database configuration instead of the newly created DB.
Useful? React with 👍 / 👎.
| void (async () => { | ||
| try { | ||
| await resolveCodemap(session, settings); |
There was a problem hiding this comment.
Await codemap setup before first-turn injection
Starting resolveCodemap fire-and-forget means the first user prompt can reach #injectCodemapTaskContext before setCodemapSessionState has run; that path sees no state and returns no injected summaries. This is especially likely when Turso provisioning or initial sync is involved, so codemap.autoInject does not reliably inject task-relevant summaries on the first turn as the feature promises.
Useful? React with 👍 / 👎.
roboomp
left a comment
There was a problem hiding this comment.
P3: this is a large new codemap/Turso feature, but the core vector/lazy-embedding and first-turn injection contracts are not wired end-to-end.
Blocking findings: retrieval is FTS-only, multilingual embeddings conflict with the 768d schema, codemap startup races first-turn injection, and the tool DB client leaks outside session state. One convention issue: new dynamic import.
I also could not check open duplicate PRs because gh is unavailable in this environment; git log origin/main --grep only showed unrelated shared task-context UI fixes. Thanks for the detailed design write-up.
| async execute(_id: string, params: GetTaskContextParams): Promise<AgentToolResult> { | ||
| const { client, config } = await getClient(this.session); | ||
| const projectLabel = resolveProjectLabel(this.session.cwd); | ||
| const opts: { maxFiles?: number; tokenBudget?: number } = {}; | ||
| if (params.max_files !== undefined) opts.maxFiles = params.max_files; | ||
| if (params.token_budget !== undefined) opts.tokenBudget = params.token_budget; | ||
| const result = await getTaskContext(client, config, params.task, projectLabel, this.session.cwd, opts); |
There was a problem hiding this comment.
blocking: get_task_context never supplies queryEmbedding, and the same is true for the first-turn path in agent-session.ts:4972. getTaskContext() only calls searchVector() when opts.queryEmbedding is present, while embedText, embedBatch, getUnembeddedSummaries, and updateEmbedding are unused. Result: the advertised hybrid/vector retrieval and lazy embedding backfill never run; enabled codemap is FTS-only.
| CREATE TABLE IF NOT EXISTS summaries ( | ||
| id INTEGER PRIMARY KEY AUTOINCREMENT, | ||
| project_label TEXT NOT NULL, | ||
| file_path TEXT NOT NULL, | ||
| summary_text TEXT NOT NULL, | ||
| content_hash TEXT NOT NULL DEFAULT '', | ||
| embedding F32_BLOB(768), |
There was a problem hiding this comment.
blocking: this schema hard-codes embedding F32_BLOB(768), but codemap.embedding.variant = "multilingual" sets dimensions = 1024 and selects intfloat/multilingual-e5-large in config.ts. Any future embedding write for that supported setting will try to store a 1024d vector in a 768d column/index, so the documented multilingual mode cannot work with this table.
| // Initialize codemap (code summaries) if enabled. Distinct from the memory | ||
| // backend — runs independently of memory.backend. Opens the Turso/libSQL DB, | ||
| // runs auto-provisioning if configured, and stores session state. Non-blocking | ||
| // so the session starts without waiting for DB init; the first-turn injection | ||
| // in #buildSystemPromptForAgentStart handles a not-yet-ready state gracefully. | ||
| void (async () => { |
There was a problem hiding this comment.
blocking: first-turn auto-injection races this fire-and-forget initialization. #injectCodemapTaskContext() returns null when getCodemapSessionState(this) is still unset, and the first model call can build the prompt immediately after session creation. In that common path the advertised first-turn injection is skipped instead of waiting for codemap readiness.
| // --- Shared per-session DB client cache ------------------------------------- | ||
|
|
||
| let cachedClient: Client | null = null; |
There was a problem hiding this comment.
blocking: this module-level client cache is not session-scoped and is not closed by shutdownCodemap(), which only closes the client stored on the AgentSession symbol. A tool call opens a second libSQL client here; ending the session leaves it alive, and simultaneous sessions with the same dbPath share mutable DB client state outside the per-session lifecycle the new state.ts is meant to enforce.
| export async function openCodemapDb(config: CodemapConfig): Promise<Client> { | ||
| // Dynamic import: @libsql/client loads a native NAPI binding (libsql) that | ||
| // must NOT load at CLI startup when codemap is disabled. Matches the | ||
| // loadFastembedOnce pattern in mnemopi/src/core/fastembed-runtime.ts:59-77 | ||
| // — optional native peers are lazy-loaded via `await import()`. | ||
| const { createClient } = await import("@libsql/client"); |
There was a problem hiding this comment.
should-fix: repo conventions explicitly ban inline/dynamic imports (await import()); new imports must be top-level. If @libsql/client must stay cold until codemap is enabled, please add a small top-level loader module/approved lazy boundary instead of embedding the dynamic import here.
Fixes found through integration testing:
- searchVector: vector_top_k() returns only 'id', not 'distance'. Compute
distance separately via vector_distance_cos(s.embedding, vector32(?))
instead of referencing non-existent v.distance column
- buildFtsQuery: change FTS5 query from implicit AND to explicit OR
('term1'* OR 'term2'*). Task queries describe intent, not exact content —
AND matching returned empty for multi-word queries where no single summary
contained all terms
- Stopwords: add common query words (how, does, what, when, where, why, who,
can, use, using, work, works) that add noise to FTS queries
- Add comprehensive integration tests: schema init, CRUD, FTS5 search, vector
search, embedding backfill, full getTaskContext pipeline with staleness
50 tests pass, 0 fail. bun check passes across all 16 packages.
… real functions Bugs found by adversarial review (3-skeptic workflowz panel): - Vector dimension mismatch: schema hardcoded F32_BLOB(768) but multilingual variant produces 1024d vectors. Made schema parameterized via buildSchemaSql(dimensions) and pass config.embedding.dimensions to initSchema - Race condition in tools.ts getClient: concurrent calls could double-open the DB client. Added in-flight promise guard so concurrent callers await the same open promise. Also expanded cache key to include syncUrl + authToken - FTS docstring mismatch: said AND, code does OR. Fixed docstring to match - retrieve.test.ts tested a copy of functions, not the real code. Exported extractKeywords, splitTokens, reciprocalRankFusion, tokenCost, packBudget from retrieve.ts and updated test to import real implementations 59 tests pass, 0 fail. bun check passes across all 16 packages.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 04997c4e2a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!injected) return baseWithCodemap; | ||
|
|
||
| const previousBaseSystemPrompt = this.#baseSystemPrompt; | ||
| const previousBaseSystemPrompt = baseWithCodemap; |
There was a problem hiding this comment.
Preserve codemap injection when memory also injects
When codemap auto-inject returns a block and the selected memory backend (hindsight or mnemopi) also returns beforeAgentStartPrompt content, previousBaseSystemPrompt now includes the extra codemap block but refreshBaseSystemPrompt() rebuilds only the raw base prompt. The length comparison below therefore always treats the prompt as changed and returns this.#baseSystemPrompt, dropping both the codemap summaries and the memory recall for that first turn.
Useful? React with 👍 / 👎.
| - `omp://`: harness docs; AVOID unless the user asks about the harness itself. | ||
| {{#if hasCodemap}} | ||
| ## Code Summaries (codemap) | ||
| File-level code summaries are available for this repo. Before reading unfamiliar files, call `get_task_context` with your task to retrieve relevant summaries (packed within a token budget). After reading a non-trivial file or making load-bearing changes, call `set_file_summary` to record a short note (purpose, key symbols, gotchas, invariants). Summaries are anchored to file content via `Bun.hash` — if a file changes, its summary is flagged `stale` and should be refreshed. |
There was a problem hiding this comment.
Gate codemap guidance on active tools
When tools.discoveryMode is all, the codemap tools are marked discoverable and the initial-tool filter removes non-essential discoverable built-ins unless they were explicitly requested. This block is gated only on codemap.enabled, so it can tell the model to call get_task_context and set_file_summary even though those tool schemas are absent from the active tool list; gate the guidance on the active tool names or force these tools active whenever this guidance is rendered.
Useful? React with 👍 / 👎.
Token usage tests (13 new, 72 total passing): - Verify codemap token formula (ceil(chars/4)+20) across edge cases - Budget packer respects token budget, always includes >=1 file - Truncation when results exceed budget or maxFiles - Token efficiency: 20+ typical summaries fit within 8000 budget - Empty result has zero token cost - Single file result has exact token cost matching formula Benchmark results (local libSQL file mode, 768d vectors): - Schema init: 0.9ms - FTS5 search: 0.2-0.7ms (flat at 1000 summaries) - Vector search (vector_top_k): 6.6-14.4ms (scales linearly) - Full getTaskContext pipeline: 0.7-17.4ms (under 20ms at 500 summaries) - All well below codemap's P95 < 200ms design target Bug fixes from adversarial review: - searchVector: vector_top_k returns only 'id', compute distance via vector_distance_cos() instead of non-existent v.distance column - Vector dimension mismatch: schema now parameterized via buildSchemaSql(dimensions) - Race condition in tools.ts: in-flight promise guard prevents double-open - FTS5 query changed from AND to OR for task-intent matching - Stopwords expanded with common query words - retrieve.ts pure functions exported for direct testing
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc84601ea2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (clientPromise) { | ||
| return clientPromise; |
There was a problem hiding this comment.
Key the in-flight codemap client by configuration
When two sessions with different codemap.dbPath/Turso settings call a codemap tool while the first openCodemapDb is still in flight, this returns the first session's promise without comparing cacheKey. The second session then writes or reads its project label through the wrong DB client for that tool call, which can leak or corrupt summaries across concurrently running projects; the in-flight promise needs to be keyed the same way as cachedClient.
Useful? React with 👍 / 👎.
| function resolveProjectLabel(cwd: string): string { | ||
| return path.basename(cwd); |
There was a problem hiding this comment.
Derive codemap scope from the repo root
In sessions started from a subdirectory of the same repository, path.basename(cwd) changes the project_label (for example repo root oh-my-pi vs packages/coding-agent), so summaries written at one cwd are invisible when the agent later runs in another cwd inside the same repo. The comment says this mirrors Hindsight, but Hindsight resolves the primary git root before taking the basename; codemap should use the same repo-root scope and store paths relative to that scope if summaries are meant to follow the repo across sessions.
Useful? React with 👍 / 👎.
| FROM vector_top_k('idx_summaries_embedding', vector32(?), ?) v | ||
| JOIN summaries s ON s.rowid = v.id | ||
| WHERE s.project_label = ? |
There was a problem hiding this comment.
Avoid limiting vector candidates before project filtering
With a shared codemap DB containing embeddings for multiple projects, vector_top_k(..., limit) returns the nearest limit rows globally before this project_label filter runs. If another project has limit closer vectors, every candidate is filtered out and the current project gets no vector seeds even though it has relevant embedded summaries just below the global cutoff; overfetch or use a project-scoped index/filter so the limit applies within the requested project.
Useful? React with 👍 / 👎.
Add unit tests for untested codemap modules (config, prompt, state, adapter, tools createIf gating, toStoredPath traversal guard) and an integration test for the first-turn injection seam. - config.test.ts (17 tests): defaults, override precedence, dbPath fallback, variant→dimensions/model mapping, floor/clamp guards - prompt.test.ts (9 tests): empty result, stale/missing tags, truncation meta, multi-file ordering - state.test.ts (9 tests): get/set roundtrip, markFirstTurnInjected, undefined-session guard - adapter.test.ts (16 tests): extension routing, no-adapter null, LSP SymbolKind mapping, sync stubs, error handling - tools.test.ts (8 tests): createIf gating (all 4 tools), path traversal rejection, in-bounds path acceptance with real DB - injection.test.ts (9 tests): guard chain, once-per-session, memory.backend='off' composition, error isolation, block content Fix path traversal vulnerability in toStoredPath — paths escaping cwd (../../etc/passwd, absolute paths) are now rejected. toStoredPath runs before getClient in all tool execute() methods (fail-fast on invalid input). Fix codemap.turso.autoProvision default: true → false (opt-in) to match design spec. Auto-provisioning fires network calls to Turso's API, creates cloud databases, and persists credentials — must be opt-in. Extract injectCodemapTaskContext from AgentSession private method into a testable standalone function in the task-context module. Remove unused fmtOps from benchmark.ts. Total: 140 tests pass (68 new), 0 fail. bun check passes across all 16 packages.
GetTaskContextTool.execute and injectCodemapTaskContext never passed queryEmbedding to getTaskContext, so the vector search branch was always skipped — making codemap FTS-only despite advertising hybrid retrieval. Both call sites now embed the task query via embedText (lazy — returns null when the embedding model is unavailable, falling back to FTS-only gracefully). Addresses blocking review finding from @roboomp and @codex on PR can1357#3185.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77b17c1bf4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const client = createClient({ | ||
| url: `file:${config.dbPath}`, | ||
| ...(hasTursoSync ? { syncUrl: config.turso.syncUrl, authToken: config.turso.authToken } : {}), |
There was a problem hiding this comment.
Create the default codemap directory before opening
When codemap.dbPath is left empty, loadCodemapConfig points at <memories>/codemap/codemap.db, but neither that loader nor openCodemapDb creates the new codemap parent directory before this client is opened. On a fresh install with codemap enabled, schema initialization will try to create a database under a missing parent and the feature is disabled before any summary can be stored; create path.dirname(config.dbPath) before constructing the libSQL client.
Useful? React with 👍 / 👎.
|
|
||
| export function extractKeywords(task: string): string[] { | ||
| // Tokenize on non-alphanumeric, lowercase, keep >= 3 chars, drop stopwords | ||
| const tokens = task.toLowerCase().match(/[a-z0-9]+/g) ?? []; |
There was a problem hiding this comment.
Split symbol names before lowercasing
For code-symbol tasks such as buildSystemPrompt, this lowercasing happens before the only camel-case splitter runs, so splitTokens(extractKeywords(task)) later sees just buildsystemprompt and emits that fused token. The resulting FTS query misses summaries or paths tokenized as build, system, and prompt, which makes get_task_context fail on common symbol-name prompts unless vector search happens to rescue it.
Useful? React with 👍 / 👎.
Summary
Adds a native (non-MCP) code summaries feature inspired by devalade/codemap: agent-written file-level summaries persisted in Turso/libSQL with native vector search, retrieved as minimal task-relevant context via hybrid FTS5 + vector retrieval with reciprocal rank fusion and budget packing.
Key features
set_file_summary,get_file_summary,get_task_context,delete_file_summarytoolsF32_BLOBvector columns +libsql_vector_idxDiskANN index +vector_top_k()ANN queriesceil(chars/4) + 20token formula)codemap.turso.autoProvisionis enabled andTURSO_API_TOKENis available, persists credentials viasettings.set()Bun.hashcontent hash comparison — stale summaries flagged on file changesMnemopiEmbedClientinstance — keepsset_file_summaryfasttoStoredPathrejects file paths that resolve outside the project cwdDesign decisions (verified through adversarial workflowz design panel)
codemap.*settings) — NOT inmemory.backendenum. Composes with any memory backend including"off"(the default)#buildSystemPromptForAgentStart, gated only oncodemap.enabled— fixes the dead-seam issue whereoffBackendhas nobeforeAgentStartPrompthooklibsql_vector_idx(composite PK without ROWID is not supported)USING ftsindex): stable, proven in codebase (history-storage.ts,mnemopi/schema.ts), works in@libsql/clientwithout experimental flagsLanguageAdapterinterface:TsAdapterships in v1 (LSP-based), Go/Python/Rust adapters are future workbuildSchemaSql(dimensions)adapts to embedding model (768d for en, 1024d for multilingual)New module:
packages/coding-agent/src/task-context/(13 files)schema.tsconfig.tsdb.tsstaleness.tsstore.tsadapter.tsretrieve.tsembed.tsturso.tstools.tsprompt.tsstate.tsindex.tsIntegration edits (10 existing files)
settings-schema.ts,settings-defs.ts,builtin-names.ts,tools/index.ts,system-prompt.ts,system-prompt.md,agent-session.ts,sdk.ts,hindsight/content.ts,package.jsonSettings
Testing
Test results: 140 tests pass, 0 fail across 10 files
staleness.test.tsretrieve.test.tsintegration.test.tstoken-usage.test.tsconfig.test.tsprompt.test.tsstate.test.tsadapter.test.tstools.test.tsinjection.test.tsToken usage verification
The token formula
ceil(summary_text.length / 4) + 20is verified across edge cases:tokenBudgetormaxFilesBug fixes found through adversarial review (3-skeptic workflowz panel)
searchVectorreferencedv.distancebutvector_top_konly returnsidvector_distance_cos()in SELECTF32_BLOB(768)but multilingual variant produces 1024d vectorsbuildSchemaSql(dimensions)tools.ts— concurrent calls could double-open DB clientqueryEmbeddingwas never passed togetTaskContextembedTextintoGetTaskContextTool.executeandinjectCodemapTaskContexttoStoredPathaccepted../../etc/passwdwithout boundary checktoStoredPathbeforegetClientin all toolexecute()methodscodemap.turso.autoProvisiondefaulted totrue(opt-out) instead offalse(opt-in) per design specfalse— auto-provisioning fires network calls + persists credentialsORjoiningretrieve.test.tstested a copy of functions, not real codefmtOpsfunction inbenchmark.tsBenchmarks
Local libSQL file mode, 768d vectors (bge-base-en-v1.5), AMD Ryzen 5 7600X:
Read latency (the hot path — retrieval)
All read operations are well under the codemap design target of P95 < 200ms.
Write performance
Write throughput is acceptable for interactive agent use (one summary per file read).
Key findings
Verification
bun checkpasses across all 16 packages (0 type errors, 0 lint errors)bun test— 140 tests pass, 0 fail across 10 filesDependency
@libsql/client@^0.17.4(lazy-loaded viaawait import()only when codemap is enabled, matching thefastembed-runtime.tsoptional-peer pattern)Design document
Full design with adversarial review history:
TASK_CONTEXT_DESIGN.md