Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support categorical data for hist. #7695

Merged
merged 3 commits into from
Feb 24, 2022
Merged

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Feb 23, 2022

  • Extract partitioner from hist.
  • Implement categorical data support by passing the gradient index directly into the partitioner.
  • Organize/update document.
  • Remove code for negative hessian.

Extracted from #7659 .

Close #4372.

@trivialfis trivialfis mentioned this pull request Feb 23, 2022
67 tasks
Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, needs further review from a CPU performance perspective.

@trivialfis
Copy link
Member Author

After some small optimization, I think the PR can improve performance for numerical data:

Bosch Higgs Year
Master 104.24593475100119 187.627630278992 34.48233067599358
PR 101.07984042700264 186.24132968499907 30.104244786998606

@trivialfis
Copy link
Member Author

As for categorical datasets, I don't have any large benchmarks yet. The route is pretty similar to numeric datasets.

@trivialfis trivialfis merged commit 83a66b4 into dmlc:master Feb 24, 2022
@trivialfis trivialfis deleted the cat-hist branch February 24, 2022 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Forbid negative Hessian values
2 participants