-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PySpark XGBoost integration #8020
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for getting this started! I haven't looked into the details yet, for the initial work let's get some simple things like code style aligned first to avoid massive changes later.
- Consider using the same document style as the rest of the library. (For instance, replace
:param:
notation). - Consider using the
black
formatter.
Please add a simple demonstration in demo/guide-python
so that we can run it and gain a better intuition on how things work. A tutorial in doc/tutorials
is also extremely welcomed!
I will attach detail design doc soon.
That would be great!
I was thinking how do we support categorical data in this PR? @trivialfis |
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Quick note for followups:
Only suggestions, we can improve upon the existing PR after the initial merge. |
another question, how to support the categorical data? |
We will work on it later. I assume we will simply pickle the feature types to executors. |
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@trivialfis @hcho3 What's the memory limitation for the github action machine for running "Test XGBoost Python package on macos-10.15" ? |
@hcho3 Could you assign me the permission to kill/rerun github actions ? so that I can rerun test faster. |
@WeichenXu123 Once your first PR is merged, you'll have permissions to run and restart tests. Right now, you have limited permissions as a first-time contributor. Can we merge this PR without MacOS support for now? |
We can. The CI hanging is very weird, local run I don't find any issue. Probably due to the CI machine resource limitation. |
@trivialfis @wbo4958 @hcho3 |
Thx @WeichenXu123 for your hard work, we're expecting to merge it ASAP. @trivialfis Would you help on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the excellent work on the pyspark interface! Will merge it once CI is fixed.
Thank you. @trivialfis There are some failure tests which is not related to my PR. Are they fixed in master branch ? If so I can merge master and rerun CI. |
Please wait until the CI is fixed. ETA by tomorrow |
@hcho3 Is CI fixed ? Thank you! |
@WeichenXu123 I restarted the tests. Hopefully we can merge this PR soon |
Signed-off-by: Weichen Xu weichen.xu@databricks.com
PySpark XGBoost integration.
Summarize some important follow-ups:
I will take 3 / 6 / 7 / 9 / 10, other items I would like to assign them to other community guys.