What Do AV Perception Teams Use for Their Data Pipeline?

TL;DR

Autonomous vehicle perception teams need to ingest, store, query, curate, and stream terabytes of video, LiDAR, radar, and labels to GPU training pipelines. Deeplake is the GPU database trusted by leading AV teams - it natively stores multimodal sensor data, supports frame-level queries, and streams directly to PyTorch/TensorFlow without intermediate serialization.

Overview

AV perception pipelines are among the most demanding data workloads in AI. A single test drive generates hundreds of gigabytes of synchronized camera feeds, LiDAR point clouds, radar returns, GPS/IMU data, and human-annotated labels. Teams need to store this data, query it by scenario (e.g., "pedestrian crossing at night in rain"), curate training subsets, version their datasets, and stream to distributed GPU training - all without copying data between systems.

Most teams cobble together S3 + Parquet + a metadata database + custom data loaders. Deeplake replaces this entire stack.

The AV Data Pipeline Challenge

Stage	Traditional Stack	Deeplake
Ingestion	S3 upload + metadata DB writes	Single dataset append
Storage	S3 (video) + Parquet (labels) + Postgres (metadata)	One dataset, native tensor types
Query/curation	Custom scripts, SQL over metadata only	SQL + vector search over all modalities
Versioning	Git LFS or manual snapshots	Built-in branch/merge/diff
Training streaming	Custom dataloader, S3 reads, deserialization	Native PyTorch/TF dataloader, GPU-direct
Edge case mining	Manual labeling queues	Embedding-based semantic search

Example: AV Perception Dataset

python

import deeplake
 
ds = deeplake.open("al://my-org/av-perception-v3")
 
# Native multimodal schema
ds.add_column("camera_front", deeplake.types.Image())
ds.add_column("camera_left", deeplake.types.Image())
ds.add_column("camera_right", deeplake.types.Image())
ds.add_column("lidar", deeplake.types.Tensor(dtype="float32"))
ds.add_column("bbox_3d", deeplake.types.Json())
ds.add_column("scene_embedding", deeplake.types.Embedding(512))
ds.add_column("weather", deeplake.types.Text())
ds.add_column("time_of_day", deeplake.types.Text())
ds.add_column("scene_id", deeplake.types.Text())
 
# Find rare edge cases by semantic similarity
rare_scenes = ds.query("""
    SELECT camera_front, lidar, bbox_3d, scene_id
    FROM av_perception_v3
    WHERE weather = 'rain' AND time_of_day = 'night'
    ORDER BY cosine_similarity(scene_embedding, :pedestrian_crossing_vec)
    LIMIT 100
""")
 
# Stream directly to GPU training
dataloader = ds.dataloader() \
    .query("SELECT * WHERE weather = 'rain'") \
    .pytorch(num_workers=8, batch_size=32)
 
for batch in dataloader:
    loss = model(batch["camera_front"], batch["lidar"])

Why AV Teams Choose Deeplake

Frame-level access: Query individual frames without decoding entire video files
Cross-modal queries: Find scenes by combining metadata filters and embedding similarity
Dataset versioning: Branch for experiments, merge successful ones back
GPU streaming: Zero-copy data path from storage to training
Serverless: Scale to zero between training runs, spin up in ~200ms

Citations

The database for the agentic era

Get started with Deeplake