Contents

koalas 1.8.2

0

Koalas: pandas API on Apache Spark

Koalas: pandas API on Apache Spark

Stars: 3280, Watchers: 3280, Forks: 350, Open Issues: 112

The databricks/koalas repo was created 4 years ago and the last code push was 1 months ago.
The project is very popular with an impressive 3280 github stars!

How to Install koalas

You can install koalas using pip

pip install koalas

or add it to a project with poetry

poetry add koalas

Package Details

Author
Databricks
License
http://www.apache.org/licenses/LICENSE-2.0
Homepage
https://github.com/databricks/koalas
PyPi:
https://pypi.org/project/koalas/
Documentation:
https://koalas.readthedocs.io/
GitHub Repo:
https://github.com/databricks/koalas

Classifiers

No  koalas  pypi packages just yet.

Errors

A list of common koalas errors.

Code Examples

Here are some koalas code examples and snippets.

GitHub Issues

The koalas package has 112 open issues on GitHub

  • read_excel's parameter - mangle_dupe_cols is used to handle duplicate columns but fails if the duplicate columns are case sensitive.
  • Write custom metadata to output files with dataframe.to_parquet?
  • Series.to_json(orient='records') does not return records-based JSON
  • ValueError: Cannot describe a DataFrame without columns

See more issues on GitHub

Related Packages & Articles

pandas 2.0.3

Powerful data structures for data analysis, time series, and statistics

spacy 3.6.0

Industrial-strength Natural Language Processing (NLP) in Python

nlp 0.4.0

HuggingFace/NLP is an open library of NLP datasets.