Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decision tree class refactor #3972

Closed
RAMitchell opened this issue Dec 6, 2018 · 6 comments
Closed

Decision tree class refactor #3972

RAMitchell opened this issue Dec 6, 2018 · 6 comments

Comments

@RAMitchell
Copy link
Member

I propose a refactor of the regression tree class with the following objectives in mind:

  • Reduce code complexity. This class is very complicated and difficult to understand/use for new contributors.
  • Increase safety. It is very easy to get a segfault with the current class. With good encapsulation and object-oriented programming I believe misuse can be entirely prevented.
  • Use this class on the GPU. We currently have significant code duplication in GPU algorithms as we had to build our own data structures usable on the GPU.

To do this we can do it in two phases:

  • First refactor to reduce redundancy and cleanup without affecting backwards compatibility
  • Propose and evaluate any further changes needed affecting backwards compatibility, discuss how we can achieve this with minimal disruption to users. e.g. continuing support for the old format for a limited time
@trivialfis
Copy link
Member

I have some ideas about refactoring. But somehow they seemed to be connected together. For example, changing reg tree may affect the JSON format we are trying to hammer out.

@trivialfis
Copy link
Member

Another related topic is whether we need the tree model as template, which increases build time significantly as we are putting IO logic in that class.

@hcho3
Copy link
Collaborator

hcho3 commented Dec 6, 2018

@trivialfis RFC will be up in a day. Sorry for the delay. I have been arranging a move to California lately.

@hcho3
Copy link
Collaborator

hcho3 commented Mar 8, 2019

@RAMitchell Can we close this issue, now that we merged #3989, #3995, #4008, #4015, and #3825 ?

@RAMitchell
Copy link
Member Author

There is still a little more to do. My idea was to make all of the internal vectors inside the regression tree private so that we can only access it through safe public methods, preventing misuse of the class. All of the ground work is done I just need to spend a little time finishing it off.

@RAMitchell
Copy link
Member Author

This was a little more complicated than I thought, I'm going to close this for now.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants