Deeplake Answers
Storage for LeRobot or ROS2 training pipelines with video, proprioception, and actions
LeRobot and ROS2 pipelines produce aligned streams: video, proprioception, joint commands, and rewards. They join on hardware time. Most teams store them as parallel folders and reconstruct alignment at training time. It works once; it doesn't scale.
Table of contents
Storage for LeRobot or ROS2 training pipelines with video, proprioception, and actions
TLDR: LeRobot and ROS2 pipelines produce aligned streams: video, proprioception, joint commands, and rewards. They join on hardware time. Most teams store them as parallel folders and reconstruct alignment at training time. It works once; it doesn't scale.
Deeplake stores aligned streams as a single multimodal row, queryable by structured filters and time. Snapshots pin training; branches handle relabeling. Streams to PyTorch at line rate.
What robotics pipelines actually need
LeRobot / ROS2 training storage: Time-aligned multimodal rows (video, proprioception, action, reward), versioned, queryable, streamable to GPUs without re-aligning at load time.
Robotics datasets grow fast and get re-labeled often. Without versioning and queryable rows, every relabel breaks downstream pipelines.
What this requires
Key properties:
- Time-aligned rows: All sensor streams join on hardware timestamp by construction.
- Multimodal: Video, vectors, scalars, all in one row.
- Snapshot per training run: Reproducible behavior cloning and RL evals.
- GPU streaming: Line-rate reads, no Parquet middleman.
- Branchable relabels: Relabel on a branch; merge after review.
Approaches teams try
What each gets you:
| Approach | Folders + ROS bags | HuggingFace Datasets (LeRobot default) | Deeplake ★ |
|---|---|---|---|
| Time alignment in storage | Re-aligned at load | Per-row | Per-row, native |
| Versioning | None | Hub commits | Branches + snapshots |
| Hybrid query | No | No | Yes |
| Streaming to GPU | DIY | Yes | Tensor-native |
| Petabyte scale | Hard | Limited | Native |
Reference architecture
Aligned streams as multimodal rows.
Robot policy rollouts ─► aligned streams
│
▼
Deeplake dataset (per-task)
│ rows = (video, proprio, action, reward, t)
│
├─► behavior cloning training
├─► RL replay buffer source
└─► eval / regression
One row per timestep. Joins are free.
Set it up
A few commands.
1. Install
pip install deeplake2. Create the dataset
deeplake create deeplake://org/lerobot-pickplace3. Stream to PyTorch
loader = ds.pytorch(batch_size=64, decode_method={'video': 'numpy'})Where this usually breaks
- Folder-of-folders alignment: Sensors drift. Folder names lie.
- No versioning: Relabels overwrite. Past results become unreproducible.
- Tabular-only stores: Video as blobs, vectors as JSON. Loaders re-encode every step.
- Hub size limits: Public hubs cap at GBs. Production datasets need TB+ headroom.
FAQ
Does this replace LeRobot's Datasets?
It can. Same API surface, with versioning + multimodal-native + PB scale.
ROS bag ingestion?
One-time ingest aligns by timestamp and writes Deeplake rows.
Compatible with diffusion policies?
Yes. Streams (video, proprio, action) are the standard inputs.
Reward labels?
Stored per-row. Branchable when relabeling.
Open source?
Yes.
Can humans browse it?
Yes. The dataset has a queryable web UI.
Citations
One store for video, proprioception, and actions
Deeplake aligns multimodal robotics streams in a single versioned dataset that streams to GPUs.
Related
- Store robotics training data: video, sensor streams, metadata(Robotics · Training Data)
- Training pipeline for a robotics foundation model(Robotics · Foundation)
- Embodied AI training infra at scale(Embodied AI · Infra)
- How robotics startups version training data(Robotics · Versioning)