feat(lerobot): Add daft.datasets.lerobot for working with LeRobot v3 datasets#7090
Draft
srilman wants to merge 11 commits into
Draft
feat(lerobot): Add daft.datasets.lerobot for working with LeRobot v3 datasets#7090srilman wants to merge 11 commits into
daft.datasets.lerobot for working with LeRobot v3 datasets#7090srilman wants to merge 11 commits into
Conversation
Rust Dependency DiffHead: ✅ OK: Within budget.
Added
Removed
|
MP4 shards pack multiple episodes back to back, so a shard's internal frame numbering does not match the parquet's episode-local frame_index (it only lines up for the first episode in each shard). Seek by absolute timestamp instead: the episode's `from_timestamp` within the shard plus the frame's episode-local `timestamp`, accepting the closest decoded frame within half a frame period. Also: - populate `video_keys` from info.json features (was a TODO) - have read() reuse read_episodes() + load_episode_frames() instead of duplicating the episode/frame join - sync docs/api/datasets.md with the current public API (read / read_episodes / load_episode_frames / read_tasks) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
End-to-end example using daft.datasets.lerobot on the EgoDex test dataset: batched H-RDT inference as a @daft.cls UDF (predict_poses.py), EgoDex-paper keypoint-error metrics (compute_metrics.py), and overlay visualizations projecting predicted vs ground-truth hand poses onto the video frames (visualize_predictions.py). Includes a vendored copy of the reader so the scripts also run against released daft wheels.
The module's public surface changed (episodes -> read_episodes, read_info/read_stats folded into include_meta/include_stats kwargs, new read() entry point, video decode moved from load_episode_frames flags to read(load_video_frames=...)), but the tests still imported the old names, failing at collection. - rename call sites to read_episodes / load_episode_frames(ep, uri) - replace the read_info/read_stats test with coverage for the include_meta / include_stats column toggles - add a read() frame-level test and a v2-dataset rejection test - port the two video decode tests to read(load_video_frames=...), exercising the new timestamp-based frame matching 8 tests, all passing locally with DAFT_RUNNER=native.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7090 +/- ##
==========================================
+ Coverage 75.22% 76.03% +0.80%
==========================================
Files 1148 1165 +17
Lines 161452 165614 +4162
==========================================
+ Hits 121456 125926 +4470
+ Misses 39996 39688 -308
🚀 New features to boost your workflow:
|
Keep this PR scoped to the daft.datasets.lerobot reader itself. The end-to-end H-RDT pose prediction example (prediction, metrics, visualization scripts) moves to the daft-examples repository.
This comment has been minimized.
This comment has been minimized.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes Made
New module to work with LeRobot v3 datasets in Daft. In particular
TODOs:
Related Works
Design loosely based around the GitHub discussion here: #6313