Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: read_delta and to_delta for some backends #6354

Merged
merged 1 commit into from
Jun 5, 2023

Conversation

lostmygithubaccount
Copy link
Member

@lostmygithubaccount lostmygithubaccount commented Jun 1, 2023

closes #6319

adds read_delta for some backends and to_delta for all backends (writing locally)

TODO:

  • tests
    - [ ] datafusion backend
    - [ ] spark backend?
  • handling the dependency/import for deltalake
  • update some docstrings

@cpcloud
Copy link
Member

cpcloud commented Jun 2, 2023

Regarding dependencies, I think we can make deltalake its own extra, to avoid requiring it for every backend including if someone won't ever need support for reading or writing deltalake files.

I can push up the changes for that.

@cpcloud
Copy link
Member

cpcloud commented Jun 2, 2023

Added a small round trip test.

@cpcloud
Copy link
Member

cpcloud commented Jun 2, 2023

One issue with the spark implementation is that the recommended way to do this -- using delta-spark -- requires pyspark 3.4.0, which has terrible support for numpy 1.24 and pandas 2.

ibis/expr/api.py Outdated Show resolved Hide resolved
ibis/expr/api.py Outdated Show resolved Hide resolved
ibis/expr/api.py Outdated Show resolved Hide resolved
ibis/expr/api.py Outdated Show resolved Hide resolved
ibis/expr/api.py Outdated Show resolved Hide resolved
ibis/backends/duckdb/__init__.py Outdated Show resolved Hide resolved
@cpcloud
Copy link
Member

cpcloud commented Jun 2, 2023

Pushed up a fix for the conflict.

@lostmygithubaccount
Copy link
Member Author

lostmygithubaccount commented Jun 2, 2023

I think I'll skip Spark for this PR and can add later if requested? thanks for the help here! I'll look to add datafusion, update some docstrings, and look at adding more tests

@cpcloud cpcloud added this to the 6.0 milestone Jun 3, 2023
@lostmygithubaccount lostmygithubaccount force-pushed the delta-table branch 3 times, most recently from 24f57ab to d969432 Compare June 5, 2023 17:06
@lostmygithubaccount lostmygithubaccount marked this pull request as ready for review June 5, 2023 17:07
save

cleanup; add read_delta for duckdb

chore(deps): add `deltalake` dependency and extra

test(delta): add round trip deltalake format test

test: skip if `deltalake` missing

ci: add deltalake extra to polars and duckdb jobs

Apply suggestions from code review

Co-authored-by: Phillip Cloud <417981+cpcloud@users.noreply.github.com>

test: hit the top-level API in roundtrip test

docs(backends): fix typo in pip install command

fix(docs): typo in code without selectors

docstrings and try/catch deltalake import

fix lint

black; poetry stuff
@cpcloud
Copy link
Member

cpcloud commented Jun 5, 2023

Fixed up a small issue from #6323 (replacing self._tables assignment with self._add_table).

Set to automerge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: support reading and writing delta tables
2 participants