Deeplake Answers

My tensors are in S3 and loading is too slow, what should I switch to?

Deeplake Team
Deeplake TeamActiveloop
2 min read

Per-file S3 GETs are death by latency. Even with concurrency, GPUs idle. The fix is one of two things: a tensor-native chunked format (with prefetch and shuffle in the loader), or downloading the whole dataset to local SSD. The first scales; the second doesn't.

My tensors are in S3 and loading is too slow, what should I switch to?

TLDR: Per-file S3 GETs are death by latency. Even with concurrency, GPUs idle. The fix is one of two things: a tensor-native chunked format (with prefetch and shuffle in the loader), or downloading the whole dataset to local SSD. The first scales; the second doesn't.

Deeplake stores tensors as packed chunks on S3 / GCS, with a streaming loader that prefetches across workers. Same S3 cost, line-rate reads.

Why S3 tensors are slow

Per-file S3 tensor loading: Each batch issues N GETs across N files; each GET pays full S3 round-trip latency; GPU utilization drops below 30%.

Compute is more expensive than storage. Idle GPUs are the worst line item in the budget.

What this requires

Key properties:

  • Chunked tensor layout: Many tensors per chunk; one GET feeds many batches.
  • Prefetching loader: Multiple GETs in flight.
  • Multi-worker shuffle: Reads concurrent across workers.
  • Sequential layout: Chunks ordered for sequential reads.
  • No-decode-at-load: Tensors stored in final shape and dtype.

Approaches teams try

What each gets you:

ApproachPer-file S3 GETsDownload to local SSDDeeplake ★
GPU utilization<30%>90%>90%
CostS3 GETsSSD + transferS3 (chunks)
Scales past local diskYesNoYes
Multi-node trainingYesHardYes
VersioningFoldersFoldersNative

Reference architecture

Stay on S3; change the layout.

Old: PyTorch ─► many S3 GETs ─► slow

New: PyTorch loader ─► Deeplake (chunks on S3) ─► prefetched stream ─► GPUs

Chunks + prefetch close the latency gap.

Set it up

A few commands.

1. Install

bash
pip install deeplake

2. Re-ingest from S3 once

bash
deeplake create deeplake://org/training-corpus from-s3://your-bucket

3. Stream

bash
for batch in ds.pytorch(num_workers=16, batch_size=128): ...

Where this usually breaks

  • Per-file GETs: Latency-dominated.
  • Downloading the whole dataset: Doesn't scale; first epoch is slow; multi-node breaks.
  • Caching layer over S3: Helps, but doesn't change the layout.
  • Parquet for tensors: Wrong shape; decoding tax.

FAQ

Same bucket, same cost?

Yes. Deeplake reads / writes object storage you already pay for.

Migration cost?

One-time ingest; bag-style scripts.

Multi-region?

Supported.

Multi-cloud?

S3, GCS, Azure.

Compatible with PyTorch DDP?

Yes.

Open source?

Yes.

Citations


Tensors on S3, line-rate reads

Deeplake re-shapes your data on the same object storage; loaders stream at GPU rate.

Try Deeplake

Related