
datasets 4.5.0
0
HuggingFace community-driven open-source library of datasets
Contents
HuggingFace community-driven open-source library of datasets
Stars: 21194, Watchers: 21194, Forks: 3100, Open Issues: 1061The huggingface/datasets repo was created 5 years ago and the last code push was Yesterday.
The project is extremely popular with a mindblowing 21194 github stars!
How to Install datasets
You can install datasets using pip
pip install datasets
or add it to a project with poetry
poetry add datasets
Package Details
- Author
- HuggingFace Inc.
- License
- Apache 2.0
- Homepage
- https://github.com/huggingface/datasets
- PyPi:
- https://pypi.org/project/datasets/
- GitHub Repo:
- https://github.com/huggingface/datasets
Classifiers
- Scientific/Engineering/Artificial Intelligence
Related Packages
Errors
A list of common datasets errors.
Code Examples
Here are some datasets code examples and snippets.
GitHub Issues
The datasets package has 1061 open issues on GitHub
- [CUDA Tensors Not working in ~v4.5.0] set_format(type="torch", device="cuda") returns cpu
- Is the 10k files / folder limit a hard limit for a dataset repo?
- all_exhausted_without_replacement working same as first_exhausted
- #5354: replace list with Sequence in from_parquet type hints
- feat: Add GenBank file format support for biological sequence data
- docs: clarify documentation build instructions
- json: add optional return_file_name parameter
- MMLU get_dataset_config_names provides different lists of subsets in online and offline modes
- Question: Is there a faster way to push_to_hub for large image datasets?
- Remove Python 3.7 and Python 2 code paths from _dill.py
- Improve readability and documentation of indexing integration tests
- datasets.load_from_disk progress bar optional manual control
- Fix duplicate log messages by disabling log propagation by default
- Bug fix: Add HDFS hostname to protocol prefix
- xPath cannot handle hdfs:///xxxx properly
pythonfix







