spacy 3.7.4


Industrial-strength Natural Language Processing (NLP) in Python

Industrial-strength Natural Language Processing (NLP) in Python

Stars: 28628, Watchers: 28628, Forks: 4264, Open Issues: 104

The explosion/spaCy repo was created 9 years ago and the last code push was 6 hours ago.
The project is extremely popular with a mindblowing 28628 github stars!

How to Install spacy

You can install spacy using pip

pip install spacy

or add it to a project with poetry

poetry add spacy

Package Details

GitHub Repo:


  • Scientific/Engineering
No  spacy  pypi packages just yet.


A list of common spacy errors.

Code Examples

Here are some spacy code examples and snippets.

GitHub Issues

The spacy package has 104 open issues on GitHub

  • Update typing hints
  • Non-deterministic evaluation when using the experimental edit_tree_lemmatizer
  • Fix special tokenization cases not applied when adjacent to an infix #10086
  • Contractions incorrectly tokenized when part of an infix substring
  • Please add the Per-class Metrics for en_core_web_trf on NER-Task
  • Add spans to doc.to_json
  • Add spancat pipeline in spacy debug data
  • TypeError: can not serialize 'cupy._core.core.ndarray' object
  • Some sentences don't have ROOT among token dependency relation markers while there is still a root in the sentence
  • Different NER results on Linux/Mac and Windows with it-core-news-lg-3.2.0
  • Iceland lang code fix
  • Add visualisations for parsed documents
  • Textcat loss is scaled by batch size (and number of classes)
  • setting an extensions attribute in one span changes it in the other
  • DependencyMatcher fails on sents when tokens have extension attributes set to ents

See more issues on GitHub

Related Packages & Articles

gensim 4.3.2

Python framework for fast Vector Space Modelling

nlp 0.4.0

HuggingFace/NLP is an open library of NLP datasets.

keras 3.2.0

Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. The core data structures of Keras are layers and models. The philosophy is to keep simple things simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code via subclassing).

pandas 2.2.1

Powerful data structures for data analysis, time series, and statistics