Contents

petastorm 0.12.1

0

Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Pytho

Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Python-based ML training frameworks.

Stars: 1742, Watchers: 1742, Forks: 279, Open Issues: 176

The uber/petastorm repo was created 5 years ago and the last code push was 4 months ago.
The project is very popular with an impressive 1742 github stars!

How to Install petastorm

You can install petastorm using pip

pip install petastorm

or add it to a project with poetry

poetry add petastorm

Package Details

Author
Uber Technologies, Inc.
License
Apache License, Version 2.0
Homepage
https://github.com/uber/petastorm
PyPi:
https://pypi.org/project/petastorm/
GitHub Repo:
https://github.com/uber/petastorm

Classifiers

No  petastorm  pypi packages just yet.

Errors

A list of common petastorm errors.

Code Examples

Here are some petastorm code examples and snippets.

GitHub Issues

The petastorm package has 176 open issues on GitHub

  • Varying number of examples passed by DataLoader to Pytorch Lightning network
  • Remove very old pickle compatibility code modifying old atg package names
  • Support for parquet files with nested structures
  • Support for Azure Blob Storage and Azure Data Lake

See more issues on GitHub

Related Packages & Articles

onnx 1.16.0

Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. ONNX provides an open source format for AI models, both deep learning and traditional ML. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types. Currently we focus on the capabilities needed for inferencing (scoring).

horovod 0.28.1

Horovod is a powerful distributed training framework for Python that allows you to train deep learning models across multiple GPUs and servers quickly and efficiently. It falls under the category of distributed computing libraries. Built on top of TensorFlow, PyTorch, and other popular deep learning frameworks, Horovod simplifies the process of scaling up your model training by handling the complexities of distributed training under the hood.

datasets 2.18.0

HuggingFace community-driven open-source library of datasets

thinc 8.2.3

A refreshing functional take on deep learning, compatible with your favorite libraries

nlp 0.4.0

HuggingFace/NLP is an open library of NLP datasets.

keras 3.2.0

Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. The core data structures of Keras are layers and models. The philosophy is to keep simple things simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code via subclassing).

kornia 0.7.2

Open Source Differentiable Computer Vision Library for PyTorch