Contents

delta-spark 4.0.1

0

Python APIs for using Delta Lake with Apache Spark

Python APIs for using Delta Lake with Apache Spark

Stars: 8592, Watchers: 8592, Forks: 2000, Open Issues: 1331

The delta-io/delta repo was created 6 years ago and the last code push was 2 hours ago.
The project is extremely popular with a mindblowing 8592 github stars!

How to Install delta-spark

You can install delta-spark using pip

pip install delta-spark

or add it to a project with poetry

poetry add delta-spark

Package Details

Author
The Delta Lake Project Authors
License
Apache-2.0
Homepage
https://github.com/delta-io/delta/
PyPi:
https://pypi.org/project/delta-spark/
Documentation:
https://docs.delta.io/latest/index.html
GitHub Repo:
https://github.com/delta-io/delta

Classifiers

  • Software Development/Libraries/Python Modules
No  delta-spark  pypi packages just yet.

Errors

A list of common delta-spark errors.

Code Examples

Here are some delta-spark code examples and snippets.

GitHub Issues

The delta-spark package has 1331 open issues on GitHub

  • [Spark] Wrap FileNotFoundExceptions when reading with CDC
  • [Spark] Use the unitycatalog-client to implement the UCTokenBasedRestClient
  • [DO NOT MERGE]v2 test
  • [Kernel] Fix long overflow when parsing far-future timestamp stats
  • [Kernel-Spark] Renaming and clarifying catalog-managed utilities
  • [Spark] Use parsed_stats from checkpoints in loadActions when available
  • [Spark] Use the common TableSetup to build the create table SQL clause.
  • [Spark][REFACTOR][TEST-ONLY] Move test from DeltaRetentionSuiteBase to DeltaRetentionSuite
  • [SPARK] Introduce new DeltaBreakingChangeEnum abstraction
  • Avoid to use the hardcoded spark version in the python/delta/pip_utils.py
  • [CatalogManaged] Create CatalogOwnedPropertyEdgeSuite for CC
  • [Kernel] Add basic support for checkpointProtection by throwing exception on checkpoint
  • [SPARK][VARIANT] Preserve Variant stats from JSON addFiles during checkpointing
  • [Kernel] Add ST_INTERSECT_BOXES expression
  • [Server-Side Planning] Column names containing period

See more issues on GitHub

Related Packages & Articles

dagster 1.12.14

Dagster is an orchestration platform for the development, production, and observation of data assets.