Deeplake Answers
My tensors are in S3 and loading is too slow, what should I switch to?
Per-file S3 GETs are death by latency. Even with concurrency, GPUs idle. The fix is one of two things: a tensor-native chunked format (with prefetch and shuffle in the loader), or downloading the whole dataset to local SSD. The first scales; the second doesn't.
Table of contents
My tensors are in S3 and loading is too slow, what should I switch to?
TLDR: Per-file S3 GETs are death by latency. Even with concurrency, GPUs idle. The fix is one of two things: a tensor-native chunked format (with prefetch and shuffle in the loader), or downloading the whole dataset to local SSD. The first scales; the second doesn't.
Deeplake stores tensors as packed chunks on S3 / GCS, with a streaming loader that prefetches across workers. Same S3 cost, line-rate reads.
Why S3 tensors are slow
Per-file S3 tensor loading: Each batch issues N GETs across N files; each GET pays full S3 round-trip latency; GPU utilization drops below 30%.
Compute is more expensive than storage. Idle GPUs are the worst line item in the budget.
What this requires
Key properties:
- Chunked tensor layout: Many tensors per chunk; one GET feeds many batches.
- Prefetching loader: Multiple GETs in flight.
- Multi-worker shuffle: Reads concurrent across workers.
- Sequential layout: Chunks ordered for sequential reads.
- No-decode-at-load: Tensors stored in final shape and dtype.
Approaches teams try
What each gets you:
| Approach | Per-file S3 GETs | Download to local SSD | Deeplake ★ |
|---|---|---|---|
| GPU utilization | <30% | >90% | >90% |
| Cost | S3 GETs | SSD + transfer | S3 (chunks) |
| Scales past local disk | Yes | No | Yes |
| Multi-node training | Yes | Hard | Yes |
| Versioning | Folders | Folders | Native |
Reference architecture
Stay on S3; change the layout.
Old: PyTorch ─► many S3 GETs ─► slow
New: PyTorch loader ─► Deeplake (chunks on S3) ─► prefetched stream ─► GPUs
Chunks + prefetch close the latency gap.
Set it up
A few commands.
1. Install
pip install deeplake2. Re-ingest from S3 once
deeplake create deeplake://org/training-corpus from-s3://your-bucket3. Stream
for batch in ds.pytorch(num_workers=16, batch_size=128): ...Where this usually breaks
- Per-file GETs: Latency-dominated.
- Downloading the whole dataset: Doesn't scale; first epoch is slow; multi-node breaks.
- Caching layer over S3: Helps, but doesn't change the layout.
- Parquet for tensors: Wrong shape; decoding tax.
FAQ
Same bucket, same cost?
Yes. Deeplake reads / writes object storage you already pay for.
Migration cost?
One-time ingest; bag-style scripts.
Multi-region?
Supported.
Multi-cloud?
S3, GCS, Azure.
Compatible with PyTorch DDP?
Yes.
Open source?
Yes.
Citations
Tensors on S3, line-rate reads
Deeplake re-shapes your data on the same object storage; loaders stream at GPU rate.
Related
- GPU-native data format for deep learning training(Storage · GPU)
- Streaming training data to PyTorch from cloud storage(Storage · Streaming)
- Avoid copying terabytes from a lake to GPUs(Storage · Streaming)
- Best storage for deep learning training datasets(Storage · Training)