webdataset 0.2.86
0
Record sequential storage for deep learning.
Contents
Record sequential storage for deep learning.
Stars: 1981, Watchers: 1981, Forks: 152, Open Issues: 103The webdataset/webdataset
repo was created 4 years ago and the last code push was 3 weeks ago.
The project is very popular with an impressive 1981 github stars!
How to Install webdataset
You can install webdataset using pip
pip install webdataset
or add it to a project with poetry
poetry add webdataset
Package Details
- Author
- Thomas Breuel
- License
- MIT
- Homepage
- http://github.com/webdataset/webdataset
- PyPi:
- https://pypi.org/project/webdataset/
- GitHub Repo:
- https://github.com/webdataset/webdataset
Classifiers
Related Packages
Errors
A list of common webdataset errors.
Code Examples
Here are some webdataset
code examples and snippets.
GitHub Issues
The webdataset package has 103 open issues on GitHub
- fix: In ShardWriter, use TarWriter to open tars
- ShardWriter does not properly close tar files
- Handle hdfs url in pipe_cleaner
- The intended copy behaviour of
compose
is not achieved. - resume dataloader
- Close streams once consumed
- Sharded Dataset Has Long Delay Before First Batch (and caching=False)
- Stream data from Hugging Face
- Should not call out to external processes for checking file types in caching layer
- Using DDP with WebDataset in pytorch lightning
- Shard writer with a gcloud url
- ShardWriter works only with local paths