Contents

webdataset 0.2.86

0

Record sequential storage for deep learning.

Record sequential storage for deep learning.

Stars: 1981, Watchers: 1981, Forks: 152, Open Issues: 103

The webdataset/webdataset repo was created 4 years ago and the last code push was 3 weeks ago.
The project is very popular with an impressive 1981 github stars!

How to Install webdataset

You can install webdataset using pip

pip install webdataset

or add it to a project with poetry

poetry add webdataset

Package Details

Author
Thomas Breuel
License
MIT
Homepage
http://github.com/webdataset/webdataset
PyPi:
https://pypi.org/project/webdataset/
GitHub Repo:
https://github.com/webdataset/webdataset

Classifiers

No  webdataset  pypi packages just yet.

Errors

A list of common webdataset errors.

Code Examples

Here are some webdataset code examples and snippets.

GitHub Issues

The webdataset package has 103 open issues on GitHub

  • fix: In ShardWriter, use TarWriter to open tars
  • ShardWriter does not properly close tar files
  • Handle hdfs url in pipe_cleaner
  • The intended copy behaviour of compose is not achieved.
  • resume dataloader
  • Close streams once consumed
  • Sharded Dataset Has Long Delay Before First Batch (and caching=False)
  • Stream data from Hugging Face
  • Should not call out to external processes for checking file types in caching layer
  • Using DDP with WebDataset in pytorch lightning
  • Shard writer with a gcloud url
  • ShardWriter works only with local paths

See more issues on GitHub

Related Packages & Articles

torchsde 0.2.6

SDE solvers and stochastic adjoint sensitivity analysis in PyTorch.

gfpgan 1.3.8

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration

clean-fid 0.1.35

FID calculation in PyTorch with proper image resizing and quantization steps