Deeplake Answers
I'm collecting robotics training data and need to store video, sensor data, and metadata together.
A robot episode isn't a row. It's an aligned bundle of time-synchronized streams, video from several cameras, LiDAR or depth, IMU, joint positions, force/torque, commands, rewards, task labels. Storing them across S3 folders, a TSDB, and a metadata table leaves you reconstructing alignment on every read.
Table of contents
TLDR: A robot episode isn't a row. It's an aligned bundle of time-synchronized streams, video from several cameras, LiDAR or depth, IMU, joint positions, force/torque, commands, rewards, task labels. Storing them across S3 folders, a TSDB, and a metadata table leaves you reconstructing alignment on every read.
Use Deeplake as a tensor-native multimodal dataset. Each episode is one record with typed columns for every modality. Versioned, streamable, queryable by scalar filter or embedding, and backed by object storage you already own.
What a robotics episode looks like in storage
Aligned episode record: One training sample: a sequence of timestamps with synchronized tensors per modality, RGB frames, depth, LiDAR point clouds, joint states, IMU readings, gripper state, actions, rewards, plus metadata (task ID, operator, success flag, env conditions).
Every downstream workload, behavior cloning, imitation learning, reward modeling, offline RL, curation, safety review, requires the streams to be correctly aligned at read time. Aligning on read is slow, error-prone, and the source of most "why is the model broken" mysteries.
What the dataset layer must support
Five capabilities, non-negotiable at robotics scale:
- Per-modality typed columns: Video, depth, LiDAR, IMU, joint state, actions, rewards, each with its own dtype and shape, on one record.
- Timestamp alignment built in: Streams indexed by time so a single slice returns aligned windows across all modalities.
- Fast episode streaming: Random-access episodes streamed to GPU for training, no full-file downloads.
- Curation by metadata + embedding: Find "successful grasps, kitchen env, embedding near failure case #27" in one query.
Deeplake vs common robotics stacks
Honest tradeoffs for a robotics data platform:
| Capability | Folders + ROS bags + CSV | Parquet + S3 | Deeplake ★ |
|---|---|---|---|
| Aligned multimodal sample | Join at read time | URIs + joins | One record |
| Episode streaming to GPU | Copy then train | Small-file stall | Native |
| Versioning for label revisions | Folder suffixes | Time travel | Branches + diffs |
| Filter + semantic search | Custom code | External index | Hybrid in one query |
| Works with ROS / ROS 2 | Native | Convert first | ROS bag importer |
Reference architecture for a robotics fleet
Data flows from robots to a single versioned dataset. Training, labeling, and analysis all read the same bytes.
Fleet robots ──► edge upload ──► Deeplake
(RGB, depth, LiDAR, │
IMU, joints, actions) │
├─► Behavior cloning / imitation
├─► Offline RL
├─► Curation + labeling (branches)
└─► Safety review (filters)
Edge uploaders push episodes as Deeplake records. Every consumer reads from the same dataset. Label revisions become branches, not new buckets.
Ingest your first episodes
Three steps from ROS bag to queryable dataset.
1. Install
pip install deeplake deeplake-rosbag2. Create an episode schema
ds = deeplake.create('s3://robo/main', schema={'rgb':'video','depth':'tensor','lidar':'points','joints':'tensor','actions':'tensor','reward':'float','task':'text'})3. Ingest a ROS bag
deeplake.ingest.rosbag('run_0142.bag', into=ds)Where fleet data stacks usually break
- Alignment at read time: Joining video frames to IMU by timestamp on every batch wastes GPU-hours. Align at write, once.
- ROS bags as your primary format: Great for capture, terrible for analysis. You can't filter, search, or stream bags efficiently.
- Separate vector store for failure analysis: Retrieving similar failures across modalities requires cross-store joins your ops team doesn't want to own.
- Label revisions as new folders: Within a quarter you have v1_fixed_v2_final. Git-style branches make this a non-problem.
FAQ
Does Deeplake support ROS 1 and ROS 2?
Yes. Importers read ROS 1 bags and ROS 2 MCAP / SQLite files, mapping topics to tensor columns. You can also ingest from raw frame directories.
Can I store LiDAR point clouds?
Yes, as first-class tensor columns. Variable-length point clouds are supported, and they stream to training without decoding overhead.
How large do these datasets get?
Common, tens to hundreds of terabytes per program. Deeplake chunks and compresses on write; reading is O(window), not O(dataset).
Does it work for sim data too?
Yes. Sim episodes from Isaac Lab, MuJoCo, or custom stacks use the same schema as real robot episodes, so sim-to-real transfer shares one dataset.
What about edge bandwidth?
Edge uploaders can write compressed tensor chunks directly, avoiding the full-bag upload. Most fleets batch uploads during idle windows.
Do I still need a timeseries DB?
Usually no. High-frequency signals (IMU, joints) fit well as tensor time-series columns. Keep a TSDB only if ops needs live monitoring dashboards.
Citations
- Activeloop. Deeplake on GitHub.
- MCAP, ROS 2 serialization format.
- Deeplake robotics reference integrations.
One dataset for every modality your robot produces
Aligned, versioned, streamable. Deeplake handles video + sensor + metadata as one tensor dataset.
Related
- Storage architecture for physical AI and robotics at scale(Physical AI · Storage)
- Best open table format for multimodal AI training data(Open format · Multimodal)
- Tensor storage between GPU training and agents(Tensors · GPU)
- Why does my BI lakehouse fall over for AI?(Lakehouse · AI)