pyoptimus 23.5.0b0


Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.

PyOptimus is a Python library that brings together the power of various data processing engines like Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark under a single, easy-to-use API. It offers over 100 functions for data cleaning and processing, including handling strings, processing dates, URLs, and emails. PyOptimus also provides out-of-the-box functions for data exploration and quality fixing. One of the key features of PyOptimus is its ability to handle large datasets efficiently, allowing you to use the same code to process data on your laptop or on a remote cluster of GPUs.

Stars: 1394, Watchers: 1394, Forks: 235, Open Issues: 27

The hi-primus/optimus repo was created 6 years ago and the last code push was 1 months ago.
The project is very popular with an impressive 1394 github stars!

How to Install pyoptimus

You can install pyoptimus using pip

pip install pyoptimus

or add it to a project with poetry

poetry add pyoptimus

Package Details

Argenis Leon
GitHub Repo:


  • Scientific/Engineering/Artificial Intelligence
No  pyoptimus  pypi packages just yet.


A list of common pyoptimus errors.

Code Examples

Here are some pyoptimus code examples and snippets.

GitHub Issues

The pyoptimus package has 27 open issues on GitHub

  • Scheduled biweekly dependency update for week 29

See more issues on GitHub

Related Packages & Articles

optimuspyspark 2.2.32

Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion with pyspark.

sweetviz 2.1.4

A pandas-based library to visualize and compare datasets.

mage-ai 0.9.10

Mage is a tool for building and deploying data pipelines.

gradio 3.39.0

Python library for easily interacting with trained machine learning models

dtale 3.3.0

Web Client for Visualizing Pandas Objects