Contents

webdataset 0.2.100

0

Record sequential storage for deep learning.

Record sequential storage for deep learning.

Stars: 2254, Watchers: 2254, Forks: 179, Open Issues: 105

The webdataset/webdataset repo was created 5 years ago and the last code push was 4 days ago.
The project is very popular with an impressive 2254 github stars!

How to Install webdataset

You can install webdataset using pip

pip install webdataset

or add it to a project with poetry

poetry add webdataset

Package Details

Author
Thomas Breuel
License
MIT
Homepage
http://github.com/webdataset/webdataset
PyPi:
https://pypi.org/project/webdataset/
GitHub Repo:
https://github.com/webdataset/webdataset

Classifiers

No  webdataset  pypi packages just yet.

Errors

A list of common webdataset errors.

Code Examples

Here are some webdataset code examples and snippets.

GitHub Issues

The webdataset package has 105 open issues on GitHub

  • fix: In ShardWriter, use TarWriter to open tars
  • ShardWriter does not properly close tar files
  • Handle hdfs url in pipe_cleaner
  • The intended copy behaviour of compose is not achieved.
  • resume dataloader
  • Close streams once consumed
  • Sharded Dataset Has Long Delay Before First Batch (and caching=False)
  • Stream data from Hugging Face
  • Should not call out to external processes for checking file types in caching layer
  • Using DDP with WebDataset in pytorch lightning
  • Shard writer with a gcloud url
  • ShardWriter works only with local paths

See more issues on GitHub

Related Packages & Articles

torchsde 0.2.6

SDE solvers and stochastic adjoint sensitivity analysis in PyTorch.

gfpgan 1.3.8

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration

clean-fid 0.1.35

FID calculation in PyTorch with proper image resizing and quantization steps