Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.
PyOptimus is a Python library that brings together the power of various data processing engines like Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark under a single, easy-to-use API. It offers over 100 functions for data cleaning and processing, including handling strings, processing dates, URLs, and emails. PyOptimus also provides out-of-the-box functions for data exploration and quality fixing. One of the key features of PyOptimus is its ability to handle large datasets efficiently, allowing you to use the same code to process data on your laptop or on a remote cluster of GPUs.
hi-primus/optimus repo was created 6 years ago and the last code push was 1 months ago.
The project is very popular with an impressive 1394 github stars!
How to Install pyoptimus
You can install pyoptimus using pip
pip install pyoptimus
or add it to a project with poetry
poetry add pyoptimus
- Argenis Leon
- GitHub Repo:
- Scientific/Engineering/Artificial Intelligence
A list of common pyoptimus errors.
Here are some
pyoptimus code examples and snippets.
The pyoptimus package has 27 open issues on GitHub
- Scheduled biweekly dependency update for week 29