datasets 3.0.1
0
HuggingFace community-driven open-source library of datasets
Contents
HuggingFace community-driven open-source library of datasets
Stars: 19138, Watchers: 19138, Forks: 2661, Open Issues: 763The huggingface/datasets
repo was created 4 years ago and the last code push was Yesterday.
The project is extremely popular with a mindblowing 19138 github stars!
How to Install datasets
You can install datasets using pip
pip install datasets
or add it to a project with poetry
poetry add datasets
Package Details
- Author
- HuggingFace Inc.
- License
- Apache 2.0
- Homepage
- https://github.com/huggingface/datasets
- PyPi:
- https://pypi.org/project/datasets/
- GitHub Repo:
- https://github.com/huggingface/datasets
Classifiers
- Scientific/Engineering/Artificial Intelligence
Related Packages
Errors
A list of common datasets errors.
Code Examples
Here are some datasets
code examples and snippets.
GitHub Issues
The datasets package has 763 open issues on GitHub
- AutoTokenizer hash value got change after datasets.map
- [TypeError: Couldn't cast array of type] Cannot load dataset in v1.18
- Make
ted_talks_iwslt
dataset streamable - Dataset.shuffle(seed=None) gives fixed row permutation
- Adding CC-100: Monolingual Datasets from Web Crawl Data (Datasets links are invalid)
- Labels conflict when loading a local CSV file.
- DuplicatedKeysError of NewsQA dataset
- Dataset Card Creator drops information for "Additional Information" Section
- Fix host URL in The Pile datasets
- The Pile cannot connect to host
- Add a metadata field for when source data was produced
- Extend support for streaming datasets that use os.path.relpath
- Extend support for streaming datasets that use os.path.relpath
- Consider adding
ipywidgets
as a dependency. - Add Fon language tag