spec: local eval runner & vendor-neutral adapter layer by sunilgattupalle · Pull Request #5 · harness/harness-evals

sunilgattupalle · 2026-06-16T19:15:43Z

Summary

Adds specs/2026-06-16-local-eval-runner-design.md — full design spec for the local eval runner, source adapters, targets, importers, config runner, and CLI
Establishes the layered vendor-neutral architecture: core stays frozen, vendor adapters (Langfuse, Harness, etc.) live behind optional extras
Primary use case: edit prompt → harness-evals run my-eval.yaml → diff scores against baseline

Key design decisions

Four adapter family ABCs (BaseDatasetSource, BasePromptSource, BaseEvalCaseSource, BaseEvalConfigSource) + BaseTarget — all vendor-neutral
Seven plugin registries including _TARGETS, _METRICS, _BASELINE_STORES so third parties can extend every layer without forking
ResourceRef + dual-syntax resolve() (URI shorthand or typed block)
MissingAdapterError at config-load time (not a cryptic ImportError at execution)
HttpTarget v1: bearer/api_key/basic auth; OAuth/mTLS deferred
datasets.py → datasets/ package migration must be atomic with back-compat re-exports
Golden.input serialisation contract: json.dumps for non-string inputs in PromptTarget
gate_against_baseline() raises BaselineRegressionError; CLI exits non-zero

Reviewed by

Initial architecture review (Sonnet)
Independent full review (Opus) — five gaps identified and addressed in second commit

Test plan

Review specs/2026-06-16-local-eval-runner-design.md
Verify all adapter families, registries, and design decisions look correct before implementation begins

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> AI-Session-Id: 5f382a59-4928-47f9-bc96-a1159c13c50e AI-Tool: claude-code AI-Model: unknown

- Add _TARGETS, _METRICS, _BASELINE_STORES registries to plugins.py - Add register_target, register_metric, register_baseline_store decorators - Add target/metric/baseline columns to adapter registry table - Define Golden.input serialisation contract in PromptTarget (json.dumps for non-str) - Define gate_against_baseline() contract (BaselineRegressionError, CLI exit code) - Specify datasets.py → datasets/ migration must be atomic with back-compat re-exports - Clarify "zero-dependency" as zero external account; LLM metrics need [llm] extra Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> AI-Session-Id: 5f382a59-4928-47f9-bc96-a1159c13c50e AI-Tool: claude-code AI-Model: unknown

CLAassistant · 2026-06-16T19:15:51Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

sunilgattupalle added 2 commits June 16, 2026 12:04

docs(spec): add local eval runner & vendor-neutral adapter layer design

22616f0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> AI-Session-Id: 5f382a59-4928-47f9-bc96-a1159c13c50e AI-Tool: claude-code AI-Model: unknown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: local eval runner & vendor-neutral adapter layer#5

spec: local eval runner & vendor-neutral adapter layer#5
sunilgattupalle wants to merge 2 commits into
mainfrom
feat/local-eval-runner-spec

sunilgattupalle commented Jun 16, 2026

Uh oh!

CLAassistant commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sunilgattupalle commented Jun 16, 2026

Summary

Key design decisions

Reviewed by

Test plan

Uh oh!

CLAassistant commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants