Deeplake Answers
How should I unify training data curation and model evaluation for an AV perception stack?
Most AV teams curate in one tool (a labeling UI on top of S3) and evaluate in another (custom scripts on Parquet). The two diverge: a curation slice that surfaces hard cases isn't the same slice that runs in eval. Bugs hide in the gap.
Table of contents
How should I unify training data curation and model evaluation for an AV perception stack?
TLDR: Most AV teams curate in one tool (a labeling UI on top of S3) and evaluate in another (custom scripts on Parquet). The two diverge: a curation slice that surfaces hard cases isn't the same slice that runs in eval. Bugs hide in the gap.
Deeplake unifies curation and eval on one dataset. The curator's slice ("night, urban, low-light pedestrians") is the same query the eval harness runs. Every training run is pinned to a versioned snapshot. Reproducibility is structural, not a process discipline.
What "unified" means here
Unified curation + eval: One dataset. One query API. The slice a curator marks as "hard" is identifiable by the eval harness as a query, not a copied subset. Snapshots pin both training and eval to the same data state.
When curation and eval diverge, regressions ship. "It passed eval" stops meaning anything because eval ran on data that doesn't match production conditions.
What unified curation+eval requires
Four properties:
- Single source dataset: Curation and eval read the same versioned store, not exports.
- Slice as query: A "hard cases" slice is a saved query, not a copy.
- Snapshot per training run: Every run pinned to an immutable snapshot. Reproducible evals.
- Hybrid retrieval: Vector + scalar predicates so curators find rare events efficiently.
How teams structure this
What you actually get:
| Approach | Separate curation tool + eval scripts | One Parquet warehouse, two pipelines | Deeplake (unified) ★ |
|---|---|---|---|
| Curation slice = eval slice | No | If discipline holds | Same query |
| Versioning | Folders | Custom | Native |
| Hybrid query | No | SQL only | Built-in |
| Multimodal | External S3 | Tabular | Native |
Reference: one dataset, two access patterns
Curators and eval harnesses talk to the same store.
Deeplake dataset (versioned)
│
├─► curator: hybrid query, label edits, slice tagging
│ writes go to a branch, merge after review
│
├─► training: snapshot ds@v123
│
└─► eval: same snapshot, slice = saved query
Branchable, queryable, snapshot-pinned. Same artifacts, different lenses.
Wire curation + eval to one dataset
Three commands.
1. Install
pip install deeplake2. Tag a slice
ds.query('select * where label.class=="pedestrian" and time_of_day=="night"').save_as('hard_night_peds')3. Pin training + eval to one snapshot
ds_v123 = deeplake.load('deeplake://org/av@v123')Where unification usually breaks
- Two systems, two truths: When curation runs on a copy, the copy ages out. Eval drifts.
- Slice exports: Exports become sources of truth. The next labeler edits an old export. Bugs.
- Manual snapshots: If snapshots are folder copies, no one takes them. Versioning has to be native.
- No hybrid query: Curators settle for sampling. Rare events stay rare in eval too.
FAQ
Can I run curation and training off the same snapshot?
Yes. Snapshots are immutable; both processes pin to the same version.
What about labeler concurrency?
Branches. Multiple labelers work on branches and merge after review.
How do I migrate from a separate curation tool?
Most curation tools export to S3. Run a one-time ingest into Deeplake; from then on, the tool reads from Deeplake instead of S3.
Does eval get slower because curation is in the same store?
No. Reads are isolated; curation writes go to branches by default.
How are slices represented?
Saved queries with a name. Anyone can re-run them; results are deterministic per snapshot.
Open source?
Yes. Deeplake is open source.
Citations
Curation and eval, on the same dataset
Deeplake makes the curator's slice the eval harness's slice. Versioned, queryable, reproducible.
Related
- AV storage stack with camera, lidar, radar(AV · Storage)
- Petabyte-scale multimodal sensor storage for AVs(AV · PB scale)
- Best data platform for computer vision teams(CV · Platform)
- Version ML datasets like code(Versioning · ML)