-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add datafusion-python #69
Conversation
Codecov Report
@@ Coverage Diff @@
## master #69 +/- ##
==========================================
- Coverage 76.43% 75.77% -0.67%
==========================================
Files 135 142 +7
Lines 23264 23467 +203
==========================================
Hits 17782 17782
- Misses 5482 5685 +203
Continue to review full report at Codecov.
|
Thank you @jorgecarleitao I am really excited to see this and would love to see this merged into arrow-datafusion. |
Some notes:
|
@@ -0,0 +1,72 @@ | |||
name: Build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tag release probably won't work in the context of an ASF repo anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, we will need to work out a packaging; the build of the wheels is imo still relevant, as it is not so easy in Rust (afai understood support for this is still a bit WIP). Building the manylinux was a feat.
|
||
```bash | ||
pip install datafusion | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding here as a suggestion but I'll take a look at packaging it as a conda package. I'll cc you on the PR once I got a bit working.
``` |
or via conda
/mamba
:
conda install -c conda-forge datafusion
mamba install -c conda-forge datafusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xhochy
If you want you can ping me on the staged-recipes PR, once you create it. I was just reading up on the state of arrow vs. rust, and was surprised that datafusion isn't yet in conda-forge. ;-)
@@ -0,0 +1,98 @@ | |||
import unittest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: Why not pytest
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it comes with python, so no need to install other stuff. But no feelings here; we can refactor this whole thing. =)
Pushed the license and also hopefully fixed the CI. |
Co-authored-by: Andy Grove <andygrove@users.noreply.github.com>
Co-authored-by: Uwe L. Korn <xhochy@users.noreply.github.com>
Ok, I have now fixed the CI, pushed the license headers, and bumped to latest datafusion. There was a regression, documented in #226. Once we fix the regression, this can be released in pypi as 0.2.2 since there were no backward incompatible changes on it 🎉 |
@alamb @Dandandan Any objection to merging this PR? |
@andygrove please go ahead 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objections to merging from me. I skimmed it quickly and all seems good.
I think a significant investment in documentation will be needed for this code, but it seems like a good start to me
I hate to be a nuisance, but didn't this need to go through IP clearance? |
We can revert if this is the case, but because Jorge was the only contributor (except for one contribution fixing a typo in a README) this didn't seem to be required in this case? |
Probably best to check with general@incubator to determine the preferred protocol in this situation. I don't want to subject you to unneeded process, but would be good to go by the book |
@wesm thank you. Not a nuisance at all, it is important to have this done correctly. The rational here: I hold the copyright over the whole code base, except for a 1 word typo fix on the README. The code was MIT licensed on jorgecarleitao/python-datafusion. As part of this PR, I pushed a commit that added the license headers to every file in the source code. As copyright holder, I thereby licensed all this code to ASF under the ICA. |
This reverts commit 46bde0b.
PR to revert: #257 |
Thanks, I'm not enough of an expert to know what is the correct protocol, a vote may not be needed at all but let's double check |
Yeah, my interpretation was that since @jorgecarleitao authored this code, I was treating this as "just a normal PR" (it happens to have lived somewhere else for a while but from an IP provenance perspective it seemed no different to a normal PR to me). However, I am not an expert in such matters. |
This is a PR with the source code of python-datafusion, currently available at https://github.com/jorgecarleitao/datafusion-python and released in pypi as datafusion.
The goal of this PR is to gauge interest of moving that code base closer to datafusion and to within ASF.
Some notes: