webdataset 0.2.100
0
Record sequential storage for deep learning.
Contents
Record sequential storage for deep learning.
Stars: 2254, Watchers: 2254, Forks: 179, Open Issues: 105The webdataset/webdataset
repo was created 5 years ago and the last code push was 4 days ago.
The project is very popular with an impressive 2254 github stars!
How to Install webdataset
You can install webdataset using pip
pip install webdataset
or add it to a project with poetry
poetry add webdataset
Package Details
- Author
- Thomas Breuel
- License
- MIT
- Homepage
- http://github.com/webdataset/webdataset
- PyPi:
- https://pypi.org/project/webdataset/
- GitHub Repo:
- https://github.com/webdataset/webdataset
Classifiers
Related Packages
Errors
A list of common webdataset errors.
Code Examples
Here are some webdataset
code examples and snippets.
GitHub Issues
The webdataset package has 105 open issues on GitHub
- fix: In ShardWriter, use TarWriter to open tars
- ShardWriter does not properly close tar files
- Handle hdfs url in pipe_cleaner
- The intended copy behaviour of
compose
is not achieved. - resume dataloader
- Close streams once consumed
- Sharded Dataset Has Long Delay Before First Batch (and caching=False)
- Stream data from Hugging Face
- Should not call out to external processes for checking file types in caching layer
- Using DDP with WebDataset in pytorch lightning
- Shard writer with a gcloud url
- ShardWriter works only with local paths