tokenizers 0.15.2
0
Contents
Stars: 8276, Watchers: 8276, Forks: 694, Open Issues: 128
The huggingface/tokenizers
repo was created 4 years ago and the last code push was 2 days ago.
The project is extremely popular with a mindblowing 8276 github stars!
How to Install tokenizers
You can install tokenizers using pip
pip install tokenizers
or add it to a project with poetry
poetry add tokenizers
Package Details
- Author
- Anthony MOI <[email protected]>
- License
- Homepage
- PyPi:
- https://pypi.org/project/tokenizers/
- GitHub Repo:
- https://github.com/huggingface/tokenizers
Classifiers
- Scientific/Engineering/Artificial Intelligence
Related Packages
Errors
A list of common tokenizers errors.
Code Examples
Here are some tokenizers
code examples and snippets.
GitHub Issues
The tokenizers package has 128 open issues on GitHub
- No conda package for tokenizers-0.11.4
- Implement
impl_serde_type
macro - BartTokenizer for Russian language?
- PanicException For Result::unwarp()
- vocab_size issue with Whitespace pre_tokenizer
- TypeError: failed downcast to function
- [TBD] add a feature to continue training a tokenizer
- Tokenizers | TypeError: not a string
- pyo3_runtime.PanicException: Missing additional token
- Count number of tokens toeknizer might produce without really tokenizing?
- compile error when installing versions 0.9.2 or 0.8.1.rc2
- Add a
Sequence
to the processors - Add an
Sequence
object to the decoders - Adding Trie for WordPiece for faster encoding (Bert).
- Regex Capture Group?