feat(media-use): Agent Media OS#1649
Draft
miguel-heygen wants to merge 18 commits into
Draft
Conversation
Add the media-use skill foundation: - manifest.mjs: JSONL read/write/find for .media/manifest.jsonl - index-gen.mjs: regenerate agent-readable index.md from manifest - cache.mjs: content-addressed global cache at ~/.media/ with SHA-256, sentinel files, copy-on-use imports, and explicit promote - SKILL.md stub with type routing table - 19 passing tests covering round-trip, cache, promote, index generation
Add the core resolve pipeline: - resolve.mjs: entry point with cheapest-first cascade (project manifest → global cache → provider search → generate fallback → freeze → register → regenerate index) - providers.mjs: pluggable provider registry with stub implementations (real providers plug in without changing the cascade) - freeze.mjs: download URL or copy local file to .media/ - 9 passing tests covering cache hits, provider interface, CLI flags, and the one-line output contract
- cache.mjs: replace duplicated readGlobalManifest/appendGlobalRecord with readManifest(globalMediaDir())/appendRecord(globalMediaDir(), ...) - cache.mjs: extract validateCacheHit to deduplicate sentinel check - resolve.mjs: replace local typeSubdir with import from manifest.mjs - resolve.mjs: replace hand-rolled extFromCachedPath/extFromUrl with path.extname (stdlib) - manifest.mjs: export typeSubdir, remove no-op voice ternary in nextId net: -40 lines
| ); | ||
| const bytes = Buffer.from(await res.arrayBuffer()); | ||
| mkdirSync(dirname(destPath), { recursive: true }); | ||
| writeFileSync(destPath, bytes); |
Add backward-compatibility layer for pre-media-use projects: - adopt.mjs: scan assets/ directory, infer type from path/extension, register existing files in manifest without moving them - resolve cascade now checks assets/ for unregistered matches before hitting global cache or providers (step 1c in the cascade) - --adopt flag bulk-imports all assets/ files in one pass - SKILL.md documents the existing-project workflow - 3 new tests (adopt, skip duplicates, resolve finds existing) Compositions keep their existing src="assets/..." paths unchanged. The manifest and index.md become the unified view of ALL project media.
… video type - TTS: ElevenLabs as primary, Kokoro as local fallback (drop HeyGen voice) - Images: Asset Scout + HeyGen library for search, fal.ai Flux for generation - Video: added as v1.1 type with HeyGen video search + fal.ai video gen - Updated cascade docs to reflect all providers
- probe.mjs: extract duration, width, height, codec via ffprobe - adopt.mjs: probe every file on adopt — real metadata instead of blanks - SKILL.md: document all CLI tools media-use orchestrates (ffprobe, ffmpeg, fal, yt-dlp, elevenlabs, heygen, ImageMagick, hyperframes) - Validated on real blocks: nyc-paris-flight (1920x1080 image, 6s audio), macos-tahoe-liquid-glass (17 assets with 512x512 icons)
eval.mjs runs --adopt against 7 real registry blocks and produces an HTML report comparing baseline (flat file list) vs. media-use (typed, metadata-rich manifest + index). Validates adopt, ffprobe metadata, cache hits, and miss handling. Report is gitignored (generated artifact). Results: 25 assets adopted, 24/25 with ffprobe metadata, all cache hits work.
- sfx-provider.mjs: searches bundled 19-file SFX library by key, substring, and word overlap matching - Wire into provider registry (first real provider, replaces stub) - Updated eval with composition→asset coverage cross-referencing End-to-end verified: resolve:sfx "whoosh" → copies real .mp3 to .media/audio/sfx/sfx_001.mp3 → manifest records provenance → index shows 0.57s duration → re-resolve hits cache instantly
Fallow audit reportFound 5 findings. Duplication (2)
Health (3)
Generated by fallow. |
Asset tab redesign: - Categorize assets by type (Audio/Images/Video/Fonts) with filter chips - Audio rows with play button and real-time frequency spectrum visualizer - Images as large thumbnail cards or compact rows based on count - "In use" badge on assets referenced in the active composition - Used assets sorted to top within each category - Panel design tokens matching the Property Panel - Extracted helpers to assetHelpers.ts and AssetContextMenu.tsx Beat analysis fix: - Only run analysis when a beats file already exists on disk - No auto-seed of beats files on first detection - Prevents surprise green lines after dragging unrelated assets Also: video type in manifest, SFX path classification fix, empty freeze guard
52f7f9e to
53d0918
Compare
- image-provider.mjs: calls GET /v3/assets/search with type=image|icon - Picks top-scored result, freezes to .media/images/ - Resolves HeyGen credential from env/file (same pattern as audio engine) - Icons default to lower min_score (0.2) since icon matches score lower - End-to-end verified: resolve:image "sunset landscape" and resolve:icon "rocket" both return frozen files with provenance
- bgm-provider.mjs: calls retrieveBgm() from the hyperframes-media audio engine — searches HeyGen's music catalog, downloads top match - End-to-end verified: resolve:bgm "calm cinematic underscore" → 98s track frozen to .media/audio/bgm/bgm_001.mp3 with full provenance - Local generation (Lyria/MusicGen) deferred as generate() stub All v1 providers now wired: BGM + SFX + Image + Icon
bb821da to
135609a
Compare
…w wiring - SKILL.md: full agent-facing docs with types, examples, flags, adopt, inventory reading, and cross-project reuse - X-HeyGen-Client-Origin: media-use header on all asset search API calls - Router skill (hyperframes/SKILL.md): added media-use to the routing table v1 scope complete: BGM + SFX + Image + Icon providers, all end-to-end verified.
…irect API Shell out to `heygen asset search list` instead of calling the REST API directly. The CLI handles auth, OAuth refresh, and origin attribution (X-HeyGen-Client-Origin header) — no duplicated credential logic.
…ution Closes the 3 remaining PRD gaps for v1: 1. Studio reads .media/manifest.jsonl — shows duration and description from manifest records on audio rows (e.g. "25s · BGM") 2. Text search input — type-ahead filters assets by filename and manifest description, matching the PRD "search across filenames + descriptions" requirement 3. BGM X-Source attribution — heygenJSON() now sends X-HeyGen-Client-Origin header (defaults to "hyperframes", media-use overrides to "media-use" via HEYGEN_CLIENT_ORIGIN env var)
- BGM provider rewritten to use `heygen --x-source media-use audio sounds list` instead of importing the audio engine's REST client directly - Image/icon providers pass `--x-source media-use` on every CLI call - Reverted the env var approach (HEYGEN_CLIENT_ORIGIN) from heygen.mjs - heygen-cli built from source with new --x-source global flag that sends X-HeyGen-Client-Source header on every API request All 4 providers (BGM + SFX + Image + Icon) verified end-to-end.
SFX now searches the HeyGen sound_effects catalog first via `heygen --x-source media-use audio sounds list --type sound_effects`, with the bundled 19-file library as fallback when no auth is present. All 4 providers now use heygen-cli with --x-source media-use: - BGM: heygen audio sounds list --type music - SFX: heygen audio sounds list --type sound_effects (+ bundled fallback) - Image: heygen asset search list --type image - Icon: heygen asset search list --type icon
…ad code - Extract heygenSearch() shared helper — all 4 providers now use one 13-line function instead of 3 copy-pasted versions (-30 lines) - Kill registerProvider (zero callers outside removed test) - Kill stubProvider function (inline as STUB constant) - Kill generate() no-op on bgmProvider - findExistingAsset no longer runs ffprobe on every file (walks dir without stat/probe, only matches on filename)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
media-use v1 — the media resolution layer for HyperFrames. One verb (
resolve) turns a media need into a frozen local file + one-line result. Four types: BGM, SFX, images, icons — all via the HeyGen catalog.Core infrastructure
assets/scan → global cache → HeyGen catalog search → freeze → register.media/manifest.jsonl) tracks every asset with provenance. Agent-readableindex.mdregenerated on every write~/.media/(SHA-256, sentinel files, copy-on-use, explicit promote)--adoptbulk-imports a project'sassets/directory with real ffprobe metadata (duration, dimensions, codec)X-HeyGen-Client-Origin: media-useon all HeyGen API callsProviders (all end-to-end verified)
Studio Asset tab redesign
Beat analysis fix
Agent skill
SKILL.mdwith resolve syntax, type table, examples, flags, adopt workflow, inventory readinghyperframesrouter skill routing tableEval harness
Test plan