Contents

datasets 4.5.0

0

HuggingFace community-driven open-source library of datasets

HuggingFace community-driven open-source library of datasets

Stars: 21194, Watchers: 21194, Forks: 3100, Open Issues: 1061

The huggingface/datasets repo was created 5 years ago and the last code push was Yesterday.
The project is extremely popular with a mindblowing 21194 github stars!

How to Install datasets

You can install datasets using pip

pip install datasets

or add it to a project with poetry

poetry add datasets

Package Details

Author
HuggingFace Inc.
License
Apache 2.0
Homepage
https://github.com/huggingface/datasets
PyPi:
https://pypi.org/project/datasets/
GitHub Repo:
https://github.com/huggingface/datasets

Classifiers

  • Scientific/Engineering/Artificial Intelligence
No  datasets  pypi packages just yet.

Errors

A list of common datasets errors.

Code Examples

Here are some datasets code examples and snippets.

GitHub Issues

The datasets package has 1061 open issues on GitHub

  • [CUDA Tensors Not working in ~v4.5.0] set_format(type="torch", device="cuda") returns cpu
  • Is the 10k files / folder limit a hard limit for a dataset repo?
  • all_exhausted_without_replacement working same as first_exhausted
  • #5354: replace list with Sequence in from_parquet type hints
  • feat: Add GenBank file format support for biological sequence data
  • docs: clarify documentation build instructions
  • json: add optional return_file_name parameter
  • MMLU get_dataset_config_names provides different lists of subsets in online and offline modes
  • Question: Is there a faster way to push_to_hub for large image datasets?
  • Remove Python 3.7 and Python 2 code paths from _dill.py
  • Improve readability and documentation of indexing integration tests
  • datasets.load_from_disk progress bar optional manual control
  • Fix duplicate log messages by disabling log propagation by default
  • Bug fix: Add HDFS hostname to protocol prefix
  • xPath cannot handle hdfs:///xxxx properly

See more issues on GitHub

Related Packages & Articles

nlp 0.4.0

HuggingFace/NLP is an open library of NLP datasets.

thinc 9.1.1

A refreshing functional take on deep learning, compatible with your favorite libraries

transformers 5.2.0

Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

spacy 3.8.11

Industrial-strength Natural Language Processing (NLP) in Python

keras 3.13.2

Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. The core data structures of Keras are layers and models. The philosophy is to keep simple things simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code via subclassing).