Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datafusion): initial implementation for Arrow Datafusion backend #2918

Merged
merged 3 commits into from
Jan 5, 2022

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Aug 30, 2021

The main purpose if this PR is to share the work on the datafusion backend. Basic things are working, we need to add more operations (depending on the datafusion support) and additional backend specific tests.

The CI hasn't been configured yet, since it requires two PRs to be merged:

The parametrized testing suite has the following state:

 2 failed, 71 passed, 230 skipped, 261 xfailed, 1 warning

cc @cpcloud

@pep8speaks
Copy link

pep8speaks commented Aug 30, 2021

Hello @kszucs! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-01-05 23:21:15 UTC

ibis/backends/datafusion/tests/test_client.py Outdated Show resolved Hide resolved
ibis/expr/types.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/__init__.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/client.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/client.py Outdated Show resolved Hide resolved
ibis/backends/tests/conftest.py Outdated Show resolved Hide resolved
ibis/backends/tests/conftest.py Outdated Show resolved Hide resolved
ibis/backends/tests/test_numeric.py Outdated Show resolved Hide resolved
ibis/expr/types.py Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
@jreback jreback added the datafusion The Apache DataFusion backend label Sep 1, 2021
Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not create this backend in a separate repo (e.g. github.com/ibis-project/ibis-datafusion). I think it'll keep things simpler, and will allow to release ibis-datafusion more often than the ibis core if needed. It'll also help review this, since the changes required to Ibis itself, and the new backend will be in different PRs.

ibis/backends/datafusion/__init__.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/client.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/client.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/tests/conftest.py Show resolved Hide resolved
ibis/backends/datafusion/tests/test_client.py Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

github-actions bot commented Sep 22, 2021

Unit Test Results

         8 files           8 suites   15m 33s ⏱️
  2 875 tests   2 805 ✔️   70 💤 0
23 000 runs  22 440 ✔️ 560 💤 0

Results for commit 76b0a30.

♻️ This comment has been updated with latest results.

@jreback jreback added this to the 2.x milestone Sep 22, 2021
@kszucs
Copy link
Member Author

kszucs commented Sep 29, 2021

Steps required before proceeding with this PR:

  • need to wait for arrow-rs 6.0.0 release (~2/3 weeks)
  • merge the arrow-datafusion PR
  • finalize and merge this PR

@kszucs kszucs force-pushed the datafusion branch 2 times, most recently from 8e0de5f to afc6f22 Compare November 15, 2021 13:14
pyproject.toml Outdated Show resolved Hide resolved
@kszucs kszucs changed the title Rudimentary implementation for Arrow Datafusion backend [WIP] backend(datafusion): Initial implementation for Arrow Datafusion backend Dec 7, 2021
@kszucs
Copy link
Member Author

kszucs commented Dec 7, 2021

@cpcloud could you please give it a review? I'd like to finalize it and merge as an experimental backend, then create follow-up issues to add support for missing operations (windows, joins etc.).

@kszucs kszucs requested a review from cpcloud December 14, 2021 12:36
pyproject.toml Show resolved Hide resolved
.github/workflows/ibis-backends.yml Outdated Show resolved Hide resolved
.github/workflows/ibis-backends.yml Outdated Show resolved Hide resolved
ibis/backends/datafusion/__init__.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/__init__.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/compiler.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/compiler.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/compiler.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/compiler.py Outdated Show resolved Hide resolved
ibis/backends/datafusion/compiler.py Outdated Show resolved Hide resolved
@cpcloud cpcloud changed the title backend(datafusion): Initial implementation for Arrow Datafusion backend feat(datafusion): Initial implementation for Arrow Datafusion backend Dec 28, 2021
@cpcloud
Copy link
Member

cpcloud commented Dec 30, 2021

@kszucs When you have a chance can you rebase and resolve the conflicts?

@cpcloud cpcloud modified the milestones: 2.x, Next release Dec 30, 2021
@cpcloud
Copy link
Member

cpcloud commented Jan 1, 2022

@kszucs If you rebase and squash we can merge this for the 2.1 release, which we should cut next week, after #3218 is merged.

@kszucs kszucs force-pushed the datafusion branch 4 times, most recently from f07a429 to 2fb4074 Compare January 3, 2022 15:54
@cpcloud
Copy link
Member

cpcloud commented Jan 3, 2022

@kszucs rebase one more time and i'll merge this 🙏🏻

@cpcloud
Copy link
Member

cpcloud commented Jan 3, 2022

@jreback Can you approve?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few questions

ibis/backends/datafusion/compiler.py Show resolved Hide resolved
ibis/backends/datafusion/datatypes.py Show resolved Hide resolved
ibis/backends/datafusion/tests/test_udf.py Show resolved Hide resolved
@cpcloud
Copy link
Member

cpcloud commented Jan 5, 2022

I'm going to merge this today, as there are no remaining actionable review items.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2022

no objection (but follow ups as indicated)

@cpcloud
Copy link
Member

cpcloud commented Jan 5, 2022

@jreback Thanks, can you approve?

Basic things are working, we need to add more operations (depending on the datafusion support) and
additional backend specific tests.

resolves ibis-project#2627
@cpcloud cpcloud merged commit 76b0a30 into ibis-project:master Jan 5, 2022
@cpcloud cpcloud modified the milestones: Next release, 2.x Jan 7, 2022
@ibis-project-bot
Copy link
Contributor

🎉 This PR is included in version 2.1.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@kszucs kszucs deleted the datafusion branch June 8, 2022 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion The Apache DataFusion backend feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants