spark-nlp 5.0.1


John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. I

John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.

Stars: 3341, Watchers: 3341, Forks: 664, Open Issues: 43

The JohnSnowLabs/spark-nlp repo was created 5 years ago and the last code push was Yesterday.
The project is very popular with an impressive 3341 github stars!

How to Install spark-nlp

You can install spark-nlp using pip

pip install spark-nlp

or add it to a project with poetry

poetry add spark-nlp

Package Details

John Snow Labs
GitHub Repo


  • Scientific/Engineering
  • Scientific/Engineering/Artificial Intelligence
  • Scientific/Engineering/Information Analysis
  • Software Development/Build Tools
  • Software Development/Internationalization
  • Software Development/Libraries/Python Modules
  • Software Development/Localization
  • Text Processing/Linguistic
No  spark-nlp  pypi packages just yet.


A list of common spark-nlp errors.

Code Examples

Here are some spark-nlp code examples and snippets.

GitHub Issues

The spark-nlp package has 43 open issues on GitHub

  • 2023-07-28-twitter_xlm_roberta_base_sentiment_en
  • Py4JJavaError: An error occurred while calling o2951.fullAnnotateJava.
  • Introducing a new Zero-Short Classifier for XLM-RoBERTa transformer
  • SPARKNLP-738 Enforcing accuracy to 0 and 1 in classifiers
  • SPARKNLP-823 Adding streaming functionality for seq2seq components
  • The summarization model(s) are not giving any result
  • aws list read write commands not working when we start spark_nlp session
  • 'JavaPackage' object is not callable on DocumentAssembler()
  • Correct misspelled entities
  • setMultilabel() parameter in Zero-Shot Classification annotators doesn't run
  • SPARKNLP-732 Unify all externally supported file systems and cloud access
  • Add ONNX support to all transformers in Spark NLP
  • Can i run Spark NLP translator offline on cpu and Windows 11?
  • Can't Import Fine-Tuned BERT Sentence Embeddings Model
  • SparkNLP does not work in Azure Synapse

See more issues on GitHub

Related Packages & Articles

nncf 2.5.0

Neural Networks Compression Framework

farm-haystack 1.19.0

Neural Question Answering & Semantic Search at Scale. Use modern transformer based models like BERT to find answers in large document collections

spacy 3.6.0

Industrial-strength Natural Language Processing (NLP) in Python