feat(lerobot): Add `daft.datasets.lerobot` for working with LeRobot v3 datasets by srilman · Pull Request #7090 · Eventual-Inc/Daft

srilman · 2026-06-08T18:00:27Z

Changes Made

New module to work with LeRobot v3 datasets in Daft. In particular

Loading episode level data from Parquet
Loading and pairing frame-level metadata (image & sensor data every iteration in an episode) to the episodes
Loading images as video frames

TODOs:

Currently only support LeRobot v3. Should support the other versions, but they have their own quirks and methods to read them
Currently only support loading images from an associated MP4. It is possible that the images are stored as JPEGs instead

Related Works

Design loosely based around the GitHub discussion here: #6313

github-actions · 2026-06-08T18:00:47Z

Rust Dependency Diff

Head: aa9ffca13fb61e3eaafa95cf0a908f1f27a4a1ce vs Base: bf3fb1c63618ae58939a62d8efaf3cb966f695c8.

✅ OK: Within budget.

New Crates: 27
Removed Crates: 16

Added

block-buffer: 0.12.0
const-oid: 0.10.2
crypto-common: 0.2.2
digest: 0.11.3
dlv-list: 0.5.2
hybrid-array: 0.4.12
md-5: 0.11.0
mea: 0.6.4
opendal: 0.57.0
opendal-core: 0.57.0
opendal-service-cos: 0.57.0
opendal-service-fs: 0.57.0
opendal-service-github: 0.57.0
opendal-service-obs: 0.57.0
opendal-service-oss: 0.57.0
ordered-multimap: 0.7.3
quick-xml: 0.39.4
reqsign-aliyun-oss: 3.0.0
reqsign-core: 3.0.0
reqsign-file-read-tokio: 3.0.0
reqsign-huaweicloud-obs: 3.0.0
reqsign-tencent-cos: 3.0.0
reqwest: 0.13.4
rust-ini: 0.21.3
typenum: 1.20.1
wasm-streams: 0.5.0
xattr: 1.6.1

Removed

backon: 1.6.0
gloo-timers: 0.3.0
opendal: 0.55.0
quick-xml: 0.38.4
reqsign: 0.16.5
typenum: 1.19.0
windows-sys: 0.60.2
windows-targets: 0.53.5
windows_aarch64_gnullvm: 0.53.1
windows_aarch64_msvc: 0.53.1
windows_i686_gnu: 0.53.1
windows_i686_gnullvm: 0.53.1
windows_i686_msvc: 0.53.1
windows_x86_64_gnu: 0.53.1
windows_x86_64_gnullvm: 0.53.1
windows_x86_64_msvc: 0.53.1

sgarimel

lgtm

MP4 shards pack multiple episodes back to back, so a shard's internal frame numbering does not match the parquet's episode-local frame_index (it only lines up for the first episode in each shard). Seek by absolute timestamp instead: the episode's `from_timestamp` within the shard plus the frame's episode-local `timestamp`, accepting the closest decoded frame within half a frame period. Also: - populate `video_keys` from info.json features (was a TODO) - have read() reuse read_episodes() + load_episode_frames() instead of duplicating the episode/frame join - sync docs/api/datasets.md with the current public API (read / read_episodes / load_episode_frames / read_tasks) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

End-to-end example using daft.datasets.lerobot on the EgoDex test dataset: batched H-RDT inference as a @daft.cls UDF (predict_poses.py), EgoDex-paper keypoint-error metrics (compute_metrics.py), and overlay visualizations projecting predicted vs ground-truth hand poses onto the video frames (visualize_predictions.py). Includes a vendored copy of the reader so the scripts also run against released daft wheels.

The module's public surface changed (episodes -> read_episodes, read_info/read_stats folded into include_meta/include_stats kwargs, new read() entry point, video decode moved from load_episode_frames flags to read(load_video_frames=...)), but the tests still imported the old names, failing at collection. - rename call sites to read_episodes / load_episode_frames(ep, uri) - replace the read_info/read_stats test with coverage for the include_meta / include_stats column toggles - add a read() frame-level test and a v2-dataset rejection test - port the two video decode tests to read(load_video_frames=...), exercising the new timestamp-based frame matching 8 tests, all passing locally with DAFT_RUNNER=native.

codecov · 2026-06-11T22:08:03Z

Codecov Report

❌ Patch coverage is 67.85714% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.03%. Comparing base (13c1d40) to head (fae1584).
⚠️ Report is 62 commits behind head on main.

Files with missing lines	Patch %	Lines
daft/datasets/lerobot.py	63.63%	48 Missing ⚠️
daft/file/video.py	85.71%	4 Missing ⚠️
daft/functions/video.py	60.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7090      +/-   ##
==========================================
+ Coverage   75.22%   76.03%   +0.80%     
==========================================
  Files        1148     1165      +17     
  Lines      161452   165614    +4162     
==========================================
+ Hits       121456   125926    +4470     
+ Misses      39996    39688     -308

Files with missing lines	Coverage Δ
daft/datasets/__init__.py	`100.00% <100.00%> (ø)`
daft/functions/video.py	`79.16% <60.00%> (-5.05%)`	⬇️
daft/file/video.py	`82.30% <85.71%> (+1.62%)`	⬆️
daft/datasets/lerobot.py	`63.63% <63.63%> (ø)`

... and 201 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Keep this PR scoped to the daft.datasets.lerobot reader itself. The end-to-end H-RDT pose prediction example (prediction, metrics, visualization scripts) moves to the daft-examples repository.

srilman added 4 commits May 29, 2026 13:03

test it out

795d8a2

save some shit for now

f8b838c

save one more time

70ac692

save again

1fe73e0

github-actions Bot added the feat label Jun 8, 2026

srilman requested a review from sgarimel June 8, 2026 18:51

sgarimel approved these changes Jun 8, 2026

View reviewed changes

Shreyas Garimella and others added 3 commits June 11, 2026 13:57

chore(examples): move H-RDT example out of this PR

820b0f8

Keep this PR scoped to the daft.datasets.lerobot reader itself. The end-to-end H-RDT pose prediction example (prediction, metrics, visualization scripts) moves to the daft-examples repository.

sgarimel mentioned this pull request Jun 11, 2026

Add lerobot_pose example (H-RDT pose prediction) Eventual-Inc/daft-examples#40

Draft

This comment has been minimized.

Sign in to view

sgarimel added 3 commits June 11, 2026 16:48

Fix style and update lerobot docs to current API

df2861c

Format daft/functions/video.py

0f84f25

lerobot with formatting changes

fae1584

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lerobot): Add `daft.datasets.lerobot` for working with LeRobot v3 datasets#7090

feat(lerobot): Add `daft.datasets.lerobot` for working with LeRobot v3 datasets#7090
srilman wants to merge 11 commits into
mainfrom
slade/lerobot

srilman commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

sgarimel left a comment

Uh oh!

codecov Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

srilman commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Related Works

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rust Dependency Diff

Added

Removed

Uh oh!

sgarimel left a comment

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment has been minimized.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

srilman commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading

codecov Bot commented Jun 11, 2026 •

edited

Loading