Contents

tokenizers 0.15.2

0
Stars: 8276, Watchers: 8276, Forks: 694, Open Issues: 128

The huggingface/tokenizers repo was created 4 years ago and the last code push was 2 days ago.
The project is extremely popular with a mindblowing 8276 github stars!

How to Install tokenizers

You can install tokenizers using pip

pip install tokenizers

or add it to a project with poetry

poetry add tokenizers

Package Details

Author
Anthony MOI <[email protected]>
License
Homepage
PyPi:
https://pypi.org/project/tokenizers/
GitHub Repo:
https://github.com/huggingface/tokenizers

Classifiers

  • Scientific/Engineering/Artificial Intelligence
No  tokenizers  pypi packages just yet.

Errors

A list of common tokenizers errors.

Code Examples

Here are some tokenizers code examples and snippets.

GitHub Issues

The tokenizers package has 128 open issues on GitHub

  • No conda package for tokenizers-0.11.4
  • Implement impl_serde_type macro
  • BartTokenizer for Russian language?
  • PanicException For Result::unwarp()
  • vocab_size issue with Whitespace pre_tokenizer
  • TypeError: failed downcast to function
  • [TBD] add a feature to continue training a tokenizer
  • Tokenizers | TypeError: not a string
  • pyo3_runtime.PanicException: Missing additional token
  • Count number of tokens toeknizer might produce without really tokenizing?
  • compile error when installing versions 0.9.2 or 0.8.1.rc2
  • Add a Sequence to the processors
  • Add an Sequence object to the decoders
  • Adding Trie for WordPiece for faster encoding (Bert).
  • Regex Capture Group?

See more issues on GitHub

Related Packages & Articles

thinc 8.2.3

A refreshing functional take on deep learning, compatible with your favorite libraries

textblob 0.18.0.post0

Simple, Pythonic text processing. Sentiment analysis, part-of-speech tagging, noun phrase parsing, and more.

spacy 3.7.4

Industrial-strength Natural Language Processing (NLP) in Python

nlp 0.4.0

HuggingFace/NLP is an open library of NLP datasets.

gensim 4.3.2

Python framework for fast Vector Space Modelling