vaex 4.17.0


Out-of-Core DataFrames to visualize and explore big tabular datasets

Out-of-Core DataFrames to visualize and explore big tabular datasets

Stars: 8206, Watchers: 8206, Forks: 589, Open Issues: 533

The vaexio/vaex repo was created 9 years ago and the last code push was 2 months ago.
The project is extremely popular with a mindblowing 8206 github stars!

How to Install vaex

You can install vaex using pip

pip install vaex

or add it to a project with poetry

poetry add vaex

Package Details

Maarten A. Breddels
GitHub Repo:
No  vaex  pypi packages just yet.


A list of common vaex errors.

Code Examples

Here are some vaex code examples and snippets.

GitHub Issues

The vaex package has 533 open issues on GitHub

  • [BUG-REPORT] converting massive CSV (50GB) stalls
  • [BUG-REPORT] AttributeError: 'ProgressBar' object has no attribute 'stime0'
  • vx.from_pandas(df).export_hdf5(path) giving KeyError while writing pandas df to HDF5 file.
  • DataFrame.max returning array containing -inf values
  • Issue on page /tutorial_jupyter.html
  • [BUG-REPORT] PydanticImportError: BaseSettings has been moved
  • [BUG-REPORT] AssertionError while performing math operation on shifted columns
  • Fixes #2350 Implementing take function in Vaex for first n colums
  • fix bug : open csv file use delimiter other than comma。
  • [Bug Fix] Broken graphQL query comparisons
  • Interactive widget fix
  • dont use take with arrow
  • Build aarch64 wheels and support python 3.11
  • fix typos in the learn more about vex section from the README file
  • Fix: evaluate iterator when selection=True

See more issues on GitHub

Related Packages & Articles

ludwig 0.10.3

Declarative machine learning: End-to-end machine learning pipelines using data-driven configurations.

pyoptimus 23.5.0b0

PyOptimus is a Python library that brings together the power of various data processing engines like Pandas, Dask, cuDF, Dask-cuDF, Vaex, and PySpark under a single, easy-to-use API. It offers over 100 functions for data cleaning and processing, including handling strings, processing dates, URLs, and emails. PyOptimus also provides out-of-the-box functions for data exploration and quality fixing. One of the key features of PyOptimus is its ability to handle large datasets efficiently, allowing you to use the same code to process data on your laptop or on a remote cluster of GPUs.

optimuspyspark 2.2.32

Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion with pyspark.

fiftyone 0.24.1

FiftyOne: the open-source tool for building high-quality datasets and computer vision models

dtreeviz 2.2.2

A Python 3 library for sci-kit learn, XGBoost, LightGBM, Spark, and TensorFlow decision tree visualization

kangas 2.4.9

Tool for exploring columnar data, including multimedia