Contents

datasets 2.18.0

0

HuggingFace community-driven open-source library of datasets

HuggingFace community-driven open-source library of datasets

Stars: 18327, Watchers: 18327, Forks: 2497, Open Issues: 673

The huggingface/datasets repo was created 4 years ago and the last code push was 2 hours ago.
The project is extremely popular with a mindblowing 18327 github stars!

How to Install datasets

You can install datasets using pip

pip install datasets

or add it to a project with poetry

poetry add datasets

Package Details

Author
HuggingFace Inc.
License
Apache 2.0
Homepage
https://github.com/huggingface/datasets
PyPi:
https://pypi.org/project/datasets/
GitHub Repo:
https://github.com/huggingface/datasets

Classifiers

  • Scientific/Engineering/Artificial Intelligence
No  datasets  pypi packages just yet.

Errors

A list of common datasets errors.

Code Examples

Here are some datasets code examples and snippets.

GitHub Issues

The datasets package has 673 open issues on GitHub

  • AutoTokenizer hash value got change after datasets.map
  • [TypeError: Couldn't cast array of type] Cannot load dataset in v1.18
  • Make ted_talks_iwslt dataset streamable
  • Dataset.shuffle(seed=None) gives fixed row permutation
  • Adding CC-100: Monolingual Datasets from Web Crawl Data (Datasets links are invalid)
  • Labels conflict when loading a local CSV file.
  • DuplicatedKeysError of NewsQA dataset
  • Dataset Card Creator drops information for "Additional Information" Section
  • Fix host URL in The Pile datasets
  • The Pile cannot connect to host
  • Add a metadata field for when source data was produced
  • Extend support for streaming datasets that use os.path.relpath
  • Extend support for streaming datasets that use os.path.relpath
  • Consider adding ipywidgets as a dependency.
  • Add Fon language tag

See more issues on GitHub

Related Packages & Articles

nlp 0.4.0

HuggingFace/NLP is an open library of NLP datasets.

thinc 8.2.3

A refreshing functional take on deep learning, compatible with your favorite libraries

spacy 3.7.4

Industrial-strength Natural Language Processing (NLP) in Python

keras 3.2.0

Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. The core data structures of Keras are layers and models. The philosophy is to keep simple things simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code via subclassing).