feat(lib): capture client-attested build provenance#454
Open
max-parke-scale wants to merge 4 commits into
Open
feat(lib): capture client-attested build provenance#454max-parke-scale wants to merge 4 commits into
max-parke-scale wants to merge 4 commits into
Conversation
Add agentex.lib.utils.build_provenance — the single producer of source identity for agent builds (git coordinates + a deterministic content hash of the build context). prepare_cloud_build_context now writes build-info.json into the staged context (populates runtime registration_metadata with no server change) and exposes provenance on CloudBuildContext so the upload can send source_* fields. Archive member order is now deterministic via a sorted enumeration shared with the hash. The hash is computed only when there is no clean commit to identify the build (dirty tree or non-git context). First of three surfaces for AGX1-418 (Phase 1, client-attested); the SGP build-record columns and the sgpctl/Gitea uploaders follow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Address Greptile review on the build-provenance capture util: - Always compute working_tree_hash (drop the "skip on clean commit" path). A `git status` clean tree can still contain .gitignore'd-but-not- .dockerignore'd files the commit can't reproduce; an always-present content hash identifies the exact shipped bytes and closes that gap. - Guard the hash (_safe_working_tree_hash) so a permission error or filesystem race degrades to None instead of aborting the build — the module contract is that capture never raises into a build. - Record dirtiness as a first-class `dirty` flag (surfaced as `source_dirty` / `dirty`) rather than overloading hash-presence, matching Go's vcs.modified and Nix's dirtyRev. None outside a git work tree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
Author
|
Addressed both Greptile findings in cf9994d:
Also, per design discussion: dirtiness is now a first-class 🧑💻🤖 — posted via Claude Code |
Greptile (T-Rex repro) showed build-info.json was written to the archive root, which the templates' Dockerfiles don't COPY and the runtime locate_build_info_path() doesn't read — so it never reached the image and the registration_metadata sink stayed empty. Beyond the placement bug, the sink is redundant: AgentexCloudDeploy.build_id is an FK to AgentexCloudBuild, so a deployment's source provenance derives from the build record (the source_* columns this work adds, Surface C) over that join — the same Build->Deploy edge lineage already traverses. No need to denormalize provenance onto registration_metadata/DeploymentHistory (which has had no producer since its read path landed 2025-09, so its git fields have never been populated). #454 now ships only the shared capture util (agentex.lib.build_provenance) plus a deterministic build-archive ordering. Provenance is delivered via the build-record sink; the runtime sink can be revived (correctly placed) if a real consumer for deployment-history provenance ever appears. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
…ld-provenance-capture
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
agentex.lib.utils.build_provenance— the shared capture util for client-attested build provenance: git coordinates (repo/commit/ref/subpath), a deterministicworking_tree_hashover the build inputs (not the tarball), adirtyflag (Govcs.modified/ NixdirtyRevshape), andnormalize_remote. Capture is best-effort and never raises into a build. Also makes the build archive’s member order deterministic via a sorted enumeration shared with the hash.First of three surfaces for AGX1-418 (Phase 1, client-attested). Provenance is delivered via the build-record sink —
source_*columns onPOST /v5/builds(Surface C, scaleapi) consumed by the sgpctl + CI uploaders (Surface B, scaleapi/sgp). This PR lands the util + archive determinism whereagentex.liblives; the uploaders/columns follow.Scope notes
build-info.json/ runtime sink. An earlier revision wrotebuild-info.jsoninto the build context for theregister_agent()→registration_metadatapath. Greptile (T-Rex) correctly flagged it as dead-on-arrival (written to the archive root, which the templates’ Dockerfiles don’t COPY andlocate_build_info_path()doesn’t read). It’s also redundant:AgentexCloudDeploy.build_idis an FK toAgentexCloudBuild, so a deployment’s source provenance derives from the build record over that join — the same Build→Deploy edge lineage already traverses. Dropped; can be revived (correctly placed) if a real consumer for deployment-history provenance ever appears.Identity model
working_tree_hashis always computed (content identity);commit/ref/repoanchor it to source when in a git work tree;dirtyrecords uncommitted changes (Noneoutside git).Tests
20 provenance unit tests (clean/dirty/untracked/detached-HEAD/no-remote/non-git/monorepo-subpath, hash determinism + one-byte/added/exec-bit/symlink sensitivity, and a never-raises-on-hash-failure guard).
ruff/pyrightclean; fulllibsuite green.🧑💻🤖 — posted via Claude Code
Greptile Summary
Adds
agentex.lib.utils.build_provenance— the canonical source-identity util for agent builds. It captures git coordinates (repo/commit/ref/subpath/dirty) plus a deterministicworking_tree_hashover the build inputs, with full best-effort degradation so no provenance failure can abort a build. It also makesBuildContextManager.zipped()use the same sortediter_context_filesenumeration as the hash, ensuring the archive member order is deterministic.build_provenance.py: new module withcapture_build_provenance,working_tree_hash,iter_context_files,normalize_remote, and_safe_working_tree_hash; every git probe wraps its own failure path; the hash is wrapped in a separatetry/exceptso filesystem errors are logged and swallowed rather than propagated.agent_manifest.py:BuildContextManager.zipped()now delegates file enumeration toiter_context_files, aligning archive contents (including symlinks) with the hash definition.test_build_provenance.py: 20 tests covering clean/dirty/untracked/detached-HEAD/no-remote/non-git/monorepo scenarios, hash sensitivity properties, and the never-raises guard.Confidence Score: 5/5
Safe to merge — the new util is additive and best-effort, and the archive change is a determinism improvement only.
All capture paths degrade gracefully to nulls; the hash is wrapped in its own try/except; git probes each handle their own failure. The only open item is a stale docstring on a method with no current callers.
No files require special attention; the
build_info()docstring inbuild_provenance.pyis worth a follow-up cleanup but does not affect runtime behavior.Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A["capture_build_provenance(repo_path, context_root)"] --> B["_safe_working_tree_hash(hash_root)"] B --> C{hash raises?} C -->|No| D["working_tree_hash(root)"] C -->|Yes – logs warning| E["tree_hash = None"] D --> F["iter_context_files(root)\nsorted rglob, files + symlinks"] F -.->|shared enumeration| G["BuildContextManager.zipped()\ntar archive – deterministic order"] A --> H["_git rev-parse --show-toplevel"] H --> I{in git repo?} I -->|No| J["BuildProvenance\nworking_tree_hash only\ndirty=None"] I -->|Yes| K["_git rev-parse HEAD\n_git symbolic-ref / describe-tags\n_git remote get-url origin\n_git log -1 author\n_git status --porcelain"] K --> L["normalize_remote(url)\nstrip scheme / credentials / .git\nlowercase host"] L --> M["dirty = status output is not None"] M --> N["BuildProvenance\nfull provenance"] N --> O["source_fields()\nomits None + author PII\nfor POST /v5/builds"]%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% flowchart TD A["capture_build_provenance(repo_path, context_root)"] --> B["_safe_working_tree_hash(hash_root)"] B --> C{hash raises?} C -->|No| D["working_tree_hash(root)"] C -->|Yes – logs warning| E["tree_hash = None"] D --> F["iter_context_files(root)\nsorted rglob, files + symlinks"] F -.->|shared enumeration| G["BuildContextManager.zipped()\ntar archive – deterministic order"] A --> H["_git rev-parse --show-toplevel"] H --> I{in git repo?} I -->|No| J["BuildProvenance\nworking_tree_hash only\ndirty=None"] I -->|Yes| K["_git rev-parse HEAD\n_git symbolic-ref / describe-tags\n_git remote get-url origin\n_git log -1 author\n_git status --porcelain"] K --> L["normalize_remote(url)\nstrip scheme / credentials / .git\nlowercase host"] L --> M["dirty = status output is not None"] M --> N["BuildProvenance\nfull provenance"] N --> O["source_fields()\nomits None + author PII\nfor POST /v5/builds"]Comments Outside Diff (1)
General comment
Reviews (4): Last reviewed commit: "Merge remote-tracking branch 'origin/nex..." | Re-trigger Greptile