Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Python getting started example #8153

Merged

Conversation

ravicodelabs
Copy link
Contributor

This PR aims to resolve issue #8146.

Current behavior:

The current Python getting started example requires the user to manually set the local file path to the Agaricus data set in libsvm format before being able to run the example code.

New behavior:

This PR leverages sklearn to load the data set. Hence, as long as the user has sklearn installed, after a pip install xgboost, the user should be able to run the getting started example as is (reducing friction, especially for new users).

Additional Details:

  • The well-known Iris data set is used, since the Agaricus data set is not available in sklearn.datasets.
  • The xgboost.XGBClassifier class is used here rather than the xgboost.fit function to train the model as the latter would require an extra step of converting form numpy arrays to xgboost.DMatrix.

Load data set via `sklearn` rather than a local file path.
Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Personally, I want to move XGBoost closer to sklearn and introduce the native interface only if necessary. ;-)

@trivialfis trivialfis merged commit 20d1bba into dmlc:master Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants