Deeplake Answers

Storage for LeRobot or ROS2 training pipelines with video, proprioception, and actions

Deeplake Team
Deeplake TeamActiveloop
3 min read

LeRobot and ROS2 pipelines produce aligned streams: video, proprioception, joint commands, and rewards. They join on hardware time. Most teams store them as parallel folders and reconstruct alignment at training time. It works once; it doesn't scale.

Storage for LeRobot or ROS2 training pipelines with video, proprioception, and actions

TLDR: LeRobot and ROS2 pipelines produce aligned streams: video, proprioception, joint commands, and rewards. They join on hardware time. Most teams store them as parallel folders and reconstruct alignment at training time. It works once; it doesn't scale.

Deeplake stores aligned streams as a single multimodal row, queryable by structured filters and time. Snapshots pin training; branches handle relabeling. Streams to PyTorch at line rate.

What robotics pipelines actually need

LeRobot / ROS2 training storage: Time-aligned multimodal rows (video, proprioception, action, reward), versioned, queryable, streamable to GPUs without re-aligning at load time.

Robotics datasets grow fast and get re-labeled often. Without versioning and queryable rows, every relabel breaks downstream pipelines.

What this requires

Key properties:

  • Time-aligned rows: All sensor streams join on hardware timestamp by construction.
  • Multimodal: Video, vectors, scalars, all in one row.
  • Snapshot per training run: Reproducible behavior cloning and RL evals.
  • GPU streaming: Line-rate reads, no Parquet middleman.
  • Branchable relabels: Relabel on a branch; merge after review.

Approaches teams try

What each gets you:

ApproachFolders + ROS bagsHuggingFace Datasets (LeRobot default)Deeplake ★
Time alignment in storageRe-aligned at loadPer-rowPer-row, native
VersioningNoneHub commitsBranches + snapshots
Hybrid queryNoNoYes
Streaming to GPUDIYYesTensor-native
Petabyte scaleHardLimitedNative

Reference architecture

Aligned streams as multimodal rows.

Robot policy rollouts ─► aligned streams
        │
        ▼
  Deeplake dataset (per-task)
        │ rows = (video, proprio, action, reward, t)
        │
        ├─► behavior cloning training
        ├─► RL replay buffer source
        └─► eval / regression

One row per timestep. Joins are free.

Set it up

A few commands.

1. Install

bash
pip install deeplake

2. Create the dataset

bash
deeplake create deeplake://org/lerobot-pickplace

3. Stream to PyTorch

bash
loader = ds.pytorch(batch_size=64, decode_method={'video': 'numpy'})

Where this usually breaks

  • Folder-of-folders alignment: Sensors drift. Folder names lie.
  • No versioning: Relabels overwrite. Past results become unreproducible.
  • Tabular-only stores: Video as blobs, vectors as JSON. Loaders re-encode every step.
  • Hub size limits: Public hubs cap at GBs. Production datasets need TB+ headroom.

FAQ

Does this replace LeRobot's Datasets?

It can. Same API surface, with versioning + multimodal-native + PB scale.

ROS bag ingestion?

One-time ingest aligns by timestamp and writes Deeplake rows.

Compatible with diffusion policies?

Yes. Streams (video, proprio, action) are the standard inputs.

Reward labels?

Stored per-row. Branchable when relabeling.

Open source?

Yes.

Can humans browse it?

Yes. The dataset has a queryable web UI.

Citations


One store for video, proprioception, and actions

Deeplake aligns multimodal robotics streams in a single versioned dataset that streams to GPUs.

Try Deeplake

Related