feat(audit): integrate hash chain + Ed25519 signing (closes #212)#237
feat(audit): integrate hash chain + Ed25519 signing (closes #212)#237naveen-kurra wants to merge 4 commits into
Conversation
Opt-in tamper-evidence for the audit stream. When FORGE_AUDIT_SIGNING_KEY_B64 is set, every emitted event carries a sig (Ed25519 over canonical JSON with sig blanked) and a kid. The runtime advertises the corresponding public key at GET /.well-known/forge-audit-keys in RFC 8037 JWKS shape. Verification lives in two places: - coreruntime.VerifyAuditLog walks NDJSON and checks structural integrity + signatures when a JWKS is supplied. - `forge audit verify --pubkey <jwks>` is the offline CLI on top. When no signing key is configured the wire shape is byte-identical to the pre-#213 stream (Kid + Sig use omitempty). Docs at docs/security/audit-signing.md.
Review from initializ-mk on PR #220 flagged a self-deadlock in AuditLogger.Emit: the top-level a.mu.Lock() was held across the sink-write loop, so a failing sink triggered a.logSinkErrorOnce which tried to reacquire a.mu — sync.Mutex is not reentrant, so any sink write error with an ops logger configured would freeze audit emission. Emit now snapshots signer/sinks/opsLog under a short critical section, then does canonicalize/sign/marshal/write outside the lock. ed25519.Sign is stateless so this is safe. Also fixes the review's "silent event drop" note by logging signing/marshal failures via the ops logger before dropping the event — an operator running a signed pipeline needs to see when signing breaks. Regression tests: - 32 concurrent Emits with a always-failing sink + ops logger; test fails on 5s timeout if the deadlock returns. - signing failure produces an operator-visible Error line. Minor review fixes: - TestLoadEd25519KeyFromEnv_RSARejected now uses a real RSA PKCS#8 key instead of truncated garbage, so the type-assertion branch is actually exercised. - TestVerifyAuditLog_UnknownKidFails drops the dead otherPub line. - Documented that the "canonical JSON" is Go encoding/json output, not JCS; non-Go verifiers must replicate Go's marshaler quirks (JCS adoption tracked separately). - Dropped forward-references to #212 PrevHash in audit_verify.go / audit_signing.go comments (the two features are being reviewed in parallel; the reference was aspirational not concrete).
Merges the R5 (hash chain) and R6 (signing) tracks into one coherent tamper-evidence pipeline per Manoj's cross-PR review. Previously #220 and #222 rewrote the same files with incompatible signatures. ## Wire shape Every event now carries `prev_hash` (always, no omitempty). Signed deployments additionally carry `kid` + `sig`. The signature covers `prev_hash` — tampering with the chain link breaks BOTH the chain walk AND the signature verification. ## Emit sequencing Under a single mutex: 1. Stamp prev_hash from a.lastHash (or genesis on first event). 2. If signer wired: stamp kid, sign canonical bytes (Sig blanked). 3. Marshal. 4. Update a.lastHash = sha256(raw line bytes, no trailing newline). 5. Write to sinks; collect errors into a local slice. After releasing the mutex, log any sink errors via logSinkErrorOnce. Deadlock fix: chain order requires holding the lock across sink writes (concurrent Emit must not race a.lastHash and must not reorder on disk). logSinkErrorOnce reacquires a.mu — non-reentrant — so we collect errors under lock and log after Unlock(). ## Precision-hole fix Chain hashing runs over the raw line bytes (writer-authored), not over a re-marshaled event. Closes the reviewer's flagged latent bug where json.Marshal(json.Unmarshal(x)) is not a fixed point when Fields carries integers > 2^53 (they round-trip through float64). The verifier likewise hashes raw line bytes as read from the stream. ## Unified verifier One VerifyAuditLog(r, opts VerifyOptions) walks the stream, checking chain + (when opts.Pubkeys is non-empty) signatures. Soft warnings for head-of-stream truncation (first event not genesis) and for signed streams verified without --pubkey. --skip-chain flag on the `forge audit verify` CLI covers the SIEM tail-ingestion case. ## Test coverage - audit_chain_sig_integration_test.go — proves chain + sig compose; proves the signature covers prev_hash (tampering with just the chain link breaks the sig); proves unsigned streams still chain; head-truncation surfaces as soft warning. - audit_hash_chain_test.go — the r5 tests (genesis progression, clean walk, tampering, deletion, 200-goroutine concurrency, malformed-line handling, prev_hash-always-written pin) adapted to the unified API. - audit_verify_test.go — the r6 tests + a new Sig-catches-tamper-with-SkipChain and a chain-catches-tamper -without-sig to cross-check the two checks catch each other's failure modes. - audit_emit_deadlock_test.go — the earlier deadlock regression test continues to pass with the new "hold lock across writes, log after unlock" pattern. ## PR reconciliation This branch obsoletes #220 and #222 as independent PRs. Plan: land #220 first (its self-contained review comments are addressed on -v2), then this branch supersedes #222 as the integration PR.
Per initializ-mk's follow-up on #220. Adopting RFC 8785 (JCS) as the signature-preimage canonicalization closes two things at once: 1. Portability. Non-Go verifiers converge on the same preimage via any RFC 8785 impl — no need to replicate Go encoding/json's field order, key sort, or HTML-safe escaping quirks. 2. Latent precision bug. Verifiers re-marshal parsed events; Go's json.Marshal(json.Unmarshal(x)) is NOT a fixed point when Fields carries integers > 2^53 (they decode to float64 and re-marshal rounded). JCS normalizes both sides through the same ES6-double rule, so an untampered stream with a big int now verifies. See TestJCS_LargeIntegerFieldsRoundTrip. ## Wire shape Signed events now carry `sigp: "jcs-1"` alongside `kid` + `sig`. The scheme identifier is covered by the signature (canonicalize runs after Sigp is stamped, with only Sig blanked), so a tamperer can't rewrite it to force a weaker verification path — the verifier explicitly rejects unknown sigp values. See TestJCS_UnsupportedSigpRejected. Unsigned events have no sigp (all three: sigp/kid/sig are omitempty). ## Numbers-as-strings caveat JCS numbers are IEEE-754 double per ES6 §6.1.6. Field values that MUST preserve 64-bit-exact precision (nanosecond epochs, 64-bit IDs) MUST be carried as JSON strings in Fields — this is a producer-side discipline, documented in canonicalBytesForSigning and in docs/security/audit-signing.md. Not enforced at library level. ## Cross-language reference docs/security/audit-signing.md now shows a Python verifier snippet using the `jcs` package + `cryptography` — parse, drop sig, JCS, Ed25519.Verify. No Go-marshaler emulation. ## Dependency New forge-core dep: github.com/gowebpki/jcs (v1.0.1, MIT, pure Go, no transitive deps beyond stdlib). ~150 LOC reference impl matches Forge's "prefer narrow proven crypto libraries" posture — number formatting is the fiddly part and the reference impl is spec-tested.
|
Folded the JCS proposal (@initializ-mk's follow-up on #220) into this PR — commit e37851f. It composes cleanly with the raw-line-bytes chain hashing that's already in the branch: chain hashes over producer-authored bytes, signatures over the RFC 8785 canonical form of the parsed value. What landed
New tests
DependencyNew forge-core dep: Full sweep + gofmt + lint clean. Docs at |
|
Reviewed for correctness and the #220 integration — this is a clean reconciliation that resolves the entire #222 review (deadlock, precision, sig-covers- One fix to make before merge:
Non-blocking notes (fine to defer):
Once |
Closes #212. Supersedes #222 (closed).
Reconciles the R5 hash chain into a single tamper-evidence pipeline on top of #220 (now merged). Every event carries
prev_hash(always) +kid+sig(both omitempty). The signature coversprev_hash, so tampering with the chain link breaks BOTH the chain walk AND the signature check.What changed vs. the original #222 (addresses @initializ-mk's review)
a.muacross sink writes (concurrent Emits must not racea.lastHash; disk order must equal chain order). So I can't release the lock before writes like feat(audit): Ed25519 per-event signing + JWKS endpoint #220 did. Instead: collect sink errors under lock, calllogSinkErrorOnceafter an explicitUnlock().sync.Mutexnon-reentrancy sidestepped. Regression testTestEmit_NoDeadlockOnSinkErrorWithOpsLoghangs pre-fix, passes now.\n), not a re-marshaled event. Closes the>2^53landmine wherejson.Marshal(json.Unmarshal(x))isn't a fixed point.prev_hash. Chain-mint runs before canonicalize+sign, so the signature's coverage extends to the chain link. Proved inTestIntegration_SigCoversPrevHash— tampering with justprev_hashbreaks the sig even with--skip-chain.VerifyAuditLog(r, opts VerifyOptions)walks chain + (when pubkeys supplied) signatures.--skip-chainon the CLI for SIEM tail-ingestion. Head-truncation is a soft warning per your intentional-limitation note.forge audit verify <file> [--pubkey <jwks>] [--skip-chain].Emit sequencing
Under one mutex:
prev_hash = a.lastHash(or genesis on first event).kid, sign canonical bytes (withsigblanked).a.lastHash = sha256(raw line bytes).After
Unlock(): log any sink errors vialogSinkErrorOnce.Test coverage
audit_chain_sig_integration_test.go— chain + sig compose; sig covers prev_hash; unsigned streams still chain; head-truncation soft-warns.audit_hash_chain_test.go(from feat(audit): hash-chained audit stream for tamper detection #222) — genesis progression, clean walk, tampering, deletion, 200-goroutine concurrency, malformed line, prev_hash-always-written pin.audit_verify_test.go— feat(audit): Ed25519 per-event signing + JWKS endpoint #220's tests + cross-checks that chain catches tampering when sig-check is skipped and vice versa.audit_emit_deadlock_test.go— the earlier deadlock regression continues to pass with the chain-lock-plus-post-unlock-logging pattern.Test plan
go test ./forge-core/... ./forge-cli/...— full sweep greengofmt -w+golangci-lint runclean