spark-nlp 5.4.1


John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. I

John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines, that scale easily in a distributed environment.

Stars: 3764, Watchers: 3764, Forks: 704, Open Issues: 46

The JohnSnowLabs/spark-nlp repo was created 6 years ago and the last code push was 4 hours ago.
The project is very popular with an impressive 3764 github stars!

How to Install spark-nlp

You can install spark-nlp using pip

pip install spark-nlp

or add it to a project with poetry

poetry add spark-nlp

Package Details

John Snow Labs
GitHub Repo:


  • Scientific/Engineering
  • Scientific/Engineering/Artificial Intelligence
  • Scientific/Engineering/Information Analysis
  • Software Development/Build Tools
  • Software Development/Internationalization
  • Software Development/Libraries/Python Modules
  • Software Development/Localization
  • Text Processing/Linguistic
No  spark-nlp  pypi packages just yet.


A list of common spark-nlp errors.

Code Examples

Here are some spark-nlp code examples and snippets.

GitHub Issues

The spark-nlp package has 46 open issues on GitHub

  • 2023-07-28-twitter_xlm_roberta_base_sentiment_en
  • Py4JJavaError: An error occurred while calling o2951.fullAnnotateJava.
  • Introducing a new Zero-Short Classifier for XLM-RoBERTa transformer
  • SPARKNLP-738 Enforcing accuracy to 0 and 1 in classifiers
  • SPARKNLP-823 Adding streaming functionality for seq2seq components
  • The summarization model(s) are not giving any result
  • aws list read write commands not working when we start spark_nlp session
  • 'JavaPackage' object is not callable on DocumentAssembler()
  • Correct misspelled entities
  • setMultilabel() parameter in Zero-Shot Classification annotators doesn't run
  • SPARKNLP-732 Unify all externally supported file systems and cloud access
  • Add ONNX support to all transformers in Spark NLP
  • Can i run Spark NLP translator offline on cpu and Windows 11?
  • Can't Import Fine-Tuned BERT Sentence Embeddings Model
  • SparkNLP does not work in Azure Synapse

See more issues on GitHub

Related Packages & Articles

farm-haystack 1.26.2

LLM framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data.

spacy 3.7.5

Industrial-strength Natural Language Processing (NLP) in Python