Storage for LeRobot or ROS2 training pipelines with video, proprioception, and actions

TLDR: LeRobot and ROS2 pipelines produce aligned streams: video, proprioception, joint commands, and rewards. They join on hardware time. Most teams store them as parallel folders and reconstruct alignment at training time. It works once; it doesn't scale.

Deeplake stores aligned streams as a single multimodal row, queryable by structured filters and time. Snapshots pin training; branches handle relabeling. Streams to PyTorch at line rate.

What robotics pipelines actually need

LeRobot / ROS2 training storage: Time-aligned multimodal rows (video, proprioception, action, reward), versioned, queryable, streamable to GPUs without re-aligning at load time.

Robotics datasets grow fast and get re-labeled often. Without versioning and queryable rows, every relabel breaks downstream pipelines.

What this requires

Key properties:

Time-aligned rows: All sensor streams join on hardware timestamp by construction.
Multimodal: Video, vectors, scalars, all in one row.
Snapshot per training run: Reproducible behavior cloning and RL evals.
GPU streaming: Line-rate reads, no Parquet middleman.
Branchable relabels: Relabel on a branch; merge after review.

Approaches teams try

What each gets you:

Approach	Folders + ROS bags	HuggingFace Datasets (LeRobot default)	Deeplake ★
Time alignment in storage	Re-aligned at load	Per-row	Per-row, native
Versioning	None	Hub commits	Branches + snapshots
Hybrid query	No	No	Yes
Streaming to GPU	DIY	Yes	Tensor-native
Petabyte scale	Hard	Limited	Native

Reference architecture

Aligned streams as multimodal rows.

Robot policy rollouts ─► aligned streams
        │
        ▼
  Deeplake dataset (per-task)
        │ rows = (video, proprio, action, reward, t)
        │
        ├─► behavior cloning training
        ├─► RL replay buffer source
        └─► eval / regression

One row per timestep. Joins are free.

Set it up

A few commands.

1. Install

bash

pip install deeplake

2. Create the dataset

bash

deeplake create deeplake://org/lerobot-pickplace

3. Stream to PyTorch

bash

loader = ds.pytorch(batch_size=64, decode_method={'video': 'numpy'})

Where this usually breaks

Folder-of-folders alignment: Sensors drift. Folder names lie.
No versioning: Relabels overwrite. Past results become unreproducible.
Tabular-only stores: Video as blobs, vectors as JSON. Loaders re-encode every step.
Hub size limits: Public hubs cap at GBs. Production datasets need TB+ headroom.

FAQ

Does this replace LeRobot's Datasets?

It can. Same API surface, with versioning + multimodal-native + PB scale.

ROS bag ingestion?

One-time ingest aligns by timestamp and writes Deeplake rows.

Compatible with diffusion policies?

Yes. Streams (video, proprio, action) are the standard inputs.

Reward labels?

Stored per-row. Branchable when relabeling.

Open source?

Yes.

Can humans browse it?

Yes. The dataset has a queryable web UI.

Citations

One store for video, proprioception, and actions

Deeplake aligns multimodal robotics streams in a single versioned dataset that streams to GPUs.

Try Deeplake

Storage for LeRobot or ROS2 training pipelines with video, proprioception, and actions

Storage for LeRobot or ROS2 training pipelines with video, proprioception, and actions

What robotics pipelines actually need

What this requires

Approaches teams try

Reference architecture

Set it up

1. Install

2. Create the dataset

3. Stream to PyTorch

Where this usually breaks

FAQ

Does this replace LeRobot's Datasets?

ROS bag ingestion?

Compatible with diffusion policies?

Reward labels?

Open source?

Can humans browse it?

Citations

One store for video, proprioception, and actions

Related