Skip to content

feat(media-use): Agent Media OS#1649

Draft
miguel-heygen wants to merge 18 commits into
mainfrom
feat/media-use-v1
Draft

feat(media-use): Agent Media OS#1649
miguel-heygen wants to merge 18 commits into
mainfrom
feat/media-use-v1

Conversation

@miguel-heygen

@miguel-heygen miguel-heygen commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

media-use v1 — the media resolution layer for HyperFrames. One verb (resolve) turns a media need into a frozen local file + one-line result. Four types: BGM, SFX, images, icons — all via the HeyGen catalog.

Core infrastructure

  • Resolve cascade — project manifest → existing assets/ scan → global cache → HeyGen catalog search → freeze → register
  • Manifest/Index — JSONL ledger (.media/manifest.jsonl) tracks every asset with provenance. Agent-readable index.md regenerated on every write
  • Global cache — content-addressed ~/.media/ (SHA-256, sentinel files, copy-on-use, explicit promote)
  • Existing asset adoption--adopt bulk-imports a project's assets/ directory with real ffprobe metadata (duration, dimensions, codec)
  • X-Source trackingX-HeyGen-Client-Origin: media-use on all HeyGen API calls

Providers (all end-to-end verified)

  • BGM — HeyGen audio catalog (10k+ tracks), searches by mood/description, downloads top match
  • SFX — bundled 19-file library with fuzzy name matching, falls back to HeyGen catalog
  • Image — HeyGen asset search API (75k+ vectors), semantic search by description
  • Icon — HeyGen asset search API (type=icon), transparent vector assets

Studio Asset tab redesign

  • Categorized sections (Audio / Images / Video / Fonts) with filter chips
  • Audio rows with play button and real-time frequency spectrum visualizer (Web Audio API AnalyserNode → CSS divs at 60fps)
  • Images as large thumbnail cards (≤4) or compact rows
  • "In use" badge on assets referenced in the active composition, sorted to top
  • Panel design tokens matching the Property Panel (panel-input, panel-text-1..5, panel-accent)

Beat analysis fix

  • Only run analysis when a beats file already exists on disk (user explicitly used the beats editor)
  • No auto-seed of beats files on first detection
  • Prevents surprise green lines on the timeline after dragging unrelated assets

Agent skill

  • Full SKILL.md with resolve syntax, type table, examples, flags, adopt workflow, inventory reading
  • Wired into the hyperframes router skill routing table

Eval harness

  • Tests against 7 real registry blocks (nyc-paris-flight, macos-tahoe-liquid-glass, etc.)
  • 25 assets adopted with real ffprobe metadata, 96% composition→asset coverage

Test plan

  • 19 manifest/index/cache unit tests
  • 12 resolve engine tests (cache hits, provider interface, CLI flags, adopt)
  • End-to-end: resolve:bgm "calm cinematic underscore" → 98s track frozen with provenance
  • End-to-end: resolve:sfx "whoosh" → bundled library hit, cache on re-resolve
  • End-to-end: resolve:image "sunset landscape" → HeyGen asset search, frozen to .media/
  • End-to-end: resolve:icon "rocket" → icon search, transparent SVG frozen
  • Eval: 7 registry blocks, 25 assets, 96% coverage
  • Manual: Studio Asset tab — filter chips, play audio with visualizer, drag-to-timeline, "in use" badge

Add the media-use skill foundation:
- manifest.mjs: JSONL read/write/find for .media/manifest.jsonl
- index-gen.mjs: regenerate agent-readable index.md from manifest
- cache.mjs: content-addressed global cache at ~/.media/ with SHA-256,
  sentinel files, copy-on-use imports, and explicit promote
- SKILL.md stub with type routing table
- 19 passing tests covering round-trip, cache, promote, index generation
Add the core resolve pipeline:
- resolve.mjs: entry point with cheapest-first cascade
  (project manifest → global cache → provider search → generate fallback
  → freeze → register → regenerate index)
- providers.mjs: pluggable provider registry with stub implementations
  (real providers plug in without changing the cascade)
- freeze.mjs: download URL or copy local file to .media/
- 9 passing tests covering cache hits, provider interface, CLI flags,
  and the one-line output contract
- cache.mjs: replace duplicated readGlobalManifest/appendGlobalRecord
  with readManifest(globalMediaDir())/appendRecord(globalMediaDir(), ...)
- cache.mjs: extract validateCacheHit to deduplicate sentinel check
- resolve.mjs: replace local typeSubdir with import from manifest.mjs
- resolve.mjs: replace hand-rolled extFromCachedPath/extFromUrl with
  path.extname (stdlib)
- manifest.mjs: export typeSubdir, remove no-op voice ternary in nextId

net: -40 lines
Comment thread skills/media-use/scripts/lib/manifest.mjs Fixed
Comment thread skills/media-use/scripts/lib/manifest.mjs Fixed
);
const bytes = Buffer.from(await res.arrayBuffer());
mkdirSync(dirname(destPath), { recursive: true });
writeFileSync(destPath, bytes);
Add backward-compatibility layer for pre-media-use projects:
- adopt.mjs: scan assets/ directory, infer type from path/extension,
  register existing files in manifest without moving them
- resolve cascade now checks assets/ for unregistered matches before
  hitting global cache or providers (step 1c in the cascade)
- --adopt flag bulk-imports all assets/ files in one pass
- SKILL.md documents the existing-project workflow
- 3 new tests (adopt, skip duplicates, resolve finds existing)

Compositions keep their existing src="assets/..." paths unchanged.
The manifest and index.md become the unified view of ALL project media.
… video type

- TTS: ElevenLabs as primary, Kokoro as local fallback (drop HeyGen voice)
- Images: Asset Scout + HeyGen library for search, fal.ai Flux for generation
- Video: added as v1.1 type with HeyGen video search + fal.ai video gen
- Updated cascade docs to reflect all providers
- probe.mjs: extract duration, width, height, codec via ffprobe
- adopt.mjs: probe every file on adopt — real metadata instead of blanks
- SKILL.md: document all CLI tools media-use orchestrates (ffprobe, ffmpeg,
  fal, yt-dlp, elevenlabs, heygen, ImageMagick, hyperframes)
- Validated on real blocks: nyc-paris-flight (1920x1080 image, 6s audio),
  macos-tahoe-liquid-glass (17 assets with 512x512 icons)
eval.mjs runs --adopt against 7 real registry blocks and produces an HTML
report comparing baseline (flat file list) vs. media-use (typed, metadata-rich
manifest + index). Validates adopt, ffprobe metadata, cache hits, and miss
handling. Report is gitignored (generated artifact).

Results: 25 assets adopted, 24/25 with ffprobe metadata, all cache hits work.
- sfx-provider.mjs: searches bundled 19-file SFX library by key,
  substring, and word overlap matching
- Wire into provider registry (first real provider, replaces stub)
- Updated eval with composition→asset coverage cross-referencing

End-to-end verified: resolve:sfx "whoosh" → copies real .mp3 to
.media/audio/sfx/sfx_001.mp3 → manifest records provenance →
index shows 0.57s duration → re-resolve hits cache instantly
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

Fallow audit report

Found 5 findings.

Duplication (2)
Severity Rule Location Description
minor fallow/code-duplication packages/studio/src/components/sidebar/AssetsTab.tsx:95 Code clone group 1 (47 lines, 2 instances)
minor fallow/code-duplication packages/studio/src/components/sidebar/AssetsTab.tsx:262 Code clone group 1 (47 lines, 2 instances)
Health (3)
Severity Rule Location Description
minor fallow/high-crap-score packages/studio/src/components/sidebar/AssetsTab.tsx:26 'AudioRow' has CRAP score 49.5 (threshold: 30.0, cyclomatic 13)
major fallow/high-crap-score packages/studio/src/components/sidebar/AssetsTab.tsx:234 'ImageCard' has CRAP score 88.0 (threshold: 30.0, cyclomatic 18)
minor fallow/high-crap-score packages/studio/src/hooks/useMusicBeatAnalysis.ts:101 '<arrow>' has CRAP score 43.1 (threshold: 30.0, cyclomatic 12)

Generated by fallow.

Asset tab redesign:
- Categorize assets by type (Audio/Images/Video/Fonts) with filter chips
- Audio rows with play button and real-time frequency spectrum visualizer
- Images as large thumbnail cards or compact rows based on count
- "In use" badge on assets referenced in the active composition
- Used assets sorted to top within each category
- Panel design tokens matching the Property Panel
- Extracted helpers to assetHelpers.ts and AssetContextMenu.tsx

Beat analysis fix:
- Only run analysis when a beats file already exists on disk
- No auto-seed of beats files on first detection
- Prevents surprise green lines after dragging unrelated assets

Also: video type in manifest, SFX path classification fix, empty freeze guard
- image-provider.mjs: calls GET /v3/assets/search with type=image|icon
- Picks top-scored result, freezes to .media/images/
- Resolves HeyGen credential from env/file (same pattern as audio engine)
- Icons default to lower min_score (0.2) since icon matches score lower
- End-to-end verified: resolve:image "sunset landscape" and resolve:icon "rocket"
  both return frozen files with provenance
Comment thread skills/media-use/scripts/lib/image-provider.mjs Fixed
- bgm-provider.mjs: calls retrieveBgm() from the hyperframes-media audio
  engine — searches HeyGen's music catalog, downloads top match
- End-to-end verified: resolve:bgm "calm cinematic underscore" → 98s track
  frozen to .media/audio/bgm/bgm_001.mp3 with full provenance
- Local generation (Lyria/MusicGen) deferred as generate() stub

All v1 providers now wired: BGM + SFX + Image + Icon
…w wiring

- SKILL.md: full agent-facing docs with types, examples, flags, adopt,
  inventory reading, and cross-project reuse
- X-HeyGen-Client-Origin: media-use header on all asset search API calls
- Router skill (hyperframes/SKILL.md): added media-use to the routing table

v1 scope complete: BGM + SFX + Image + Icon providers, all end-to-end verified.
…irect API

Shell out to `heygen asset search list` instead of calling the REST API
directly. The CLI handles auth, OAuth refresh, and origin attribution
(X-HeyGen-Client-Origin header) — no duplicated credential logic.
…ution

Closes the 3 remaining PRD gaps for v1:

1. Studio reads .media/manifest.jsonl — shows duration and description
   from manifest records on audio rows (e.g. "25s · BGM")
2. Text search input — type-ahead filters assets by filename and
   manifest description, matching the PRD "search across filenames +
   descriptions" requirement
3. BGM X-Source attribution — heygenJSON() now sends
   X-HeyGen-Client-Origin header (defaults to "hyperframes", media-use
   overrides to "media-use" via HEYGEN_CLIENT_ORIGIN env var)
- BGM provider rewritten to use `heygen --x-source media-use audio sounds list`
  instead of importing the audio engine's REST client directly
- Image/icon providers pass `--x-source media-use` on every CLI call
- Reverted the env var approach (HEYGEN_CLIENT_ORIGIN) from heygen.mjs
- heygen-cli built from source with new --x-source global flag that sends
  X-HeyGen-Client-Source header on every API request

All 4 providers (BGM + SFX + Image + Icon) verified end-to-end.
SFX now searches the HeyGen sound_effects catalog first via
`heygen --x-source media-use audio sounds list --type sound_effects`,
with the bundled 19-file library as fallback when no auth is present.

All 4 providers now use heygen-cli with --x-source media-use:
- BGM: heygen audio sounds list --type music
- SFX: heygen audio sounds list --type sound_effects (+ bundled fallback)
- Image: heygen asset search list --type image
- Icon: heygen asset search list --type icon
…ad code

- Extract heygenSearch() shared helper — all 4 providers now use one
  13-line function instead of 3 copy-pasted versions (-30 lines)
- Kill registerProvider (zero callers outside removed test)
- Kill stubProvider function (inline as STUB constant)
- Kill generate() no-op on bgmProvider
- findExistingAsset no longer runs ffprobe on every file (walks dir
  without stat/probe, only matches on filename)
@miguel-heygen miguel-heygen changed the title feat(media-use): Agent Media OS — resolve cascade + manifest/cache infrastructure feat(media-use): Agent Media OS Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants