Contents

delta-spark 3.2.1

0

Python APIs for using Delta Lake with Apache Spark

Python APIs for using Delta Lake with Apache Spark

Stars: 7511, Watchers: 7511, Forks: 1687, Open Issues: 814

The delta-io/delta repo was created 5 years ago and the last code push was 23 hours ago.
The project is extremely popular with a mindblowing 7511 github stars!

How to Install delta-spark

You can install delta-spark using pip

pip install delta-spark

or add it to a project with poetry

poetry add delta-spark

Package Details

Author
The Delta Lake Project Authors
License
Apache-2.0
Homepage
https://github.com/delta-io/delta/
PyPi:
https://pypi.org/project/delta-spark/
Documentation:
https://docs.delta.io/latest/index.html
GitHub Repo:
https://github.com/delta-io/delta

Classifiers

  • Software Development/Libraries/Python Modules
No  delta-spark  pypi packages just yet.

Errors

A list of common delta-spark errors.

Code Examples

Here are some delta-spark code examples and snippets.

GitHub Issues

The delta-spark package has 814 open issues on GitHub

  • Automatically Generating manifest files
  • Mention deletion of delta log entries in PROTOCOL
  • Refactoring and optimisation of RestoreTableCommand
  • Enable Mima check for Scala 2.13
  • Printing execution plan for merge operation with scala/python API
  • configure_spark_with_delta_pip fix to stop overwriting spark.jars.packages
  • Update integration tests to run with all published Scala versions
  • When will DeltaLake officially support GCS ?
  • Update Delta Protocol for Identity column
  • Python API for restoring delta table
  • VACUUM breaks with 'Internal error'
  • Table Delta in Athena
  • Support aggregations on target table in merge whenNotMatchedInsert operation
  • Printing execution plan for merge operation with python API
  • Support the GCS bug fix in a Delta Lake version that support spark 3.1.2

See more issues on GitHub

Related Packages & Articles

dagster 1.8.11

Dagster is an orchestration platform for the development, production, and observation of data assets.