-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tree refactor stage I #3983
Tree refactor stage I #3983
Conversation
I'm starting to think about if back porting a bug fix release is needed. The next release is gonna be heavy, which might not be stable enough. And we had some important bug fixes after the last release. |
Yes it will be risky if we do this and the serialisation changes at once. |
Can you elaborate a bit on why some of the changes. I can see that most of the changes has things to do with the generic, function renaming and remove the root index. The current proposed changes contains two sets:
The root index was originally used for building multiple trees when each of them has a given group, each in a group can have unique tree. Given that it was not a strict generalization in the code, I feel we could keep it as it is. The leaf vector was useful for implementing vector tree. Given how much we need to go through that path, it is important to think how can we deal with this |
// allocate a new node, | ||
// !!!!!! NOTE: may cause BUG here, nodes.resize | ||
inline int AllocNode() { | ||
if (param.num_deleted != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the logic for recycling pruned nodes and need to be kept here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put an exit(-1) statement inside this and ran all of the Python tests without any issues. Unless I'm missing something, these recycled nodes never get used again. The pruner deletes nodes but there is nothing that adds nodes after pruning has occurred.
The reason for removing this is to reduce the number of member variables and statefullness of the tree structure. This will make it more flexible in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only be triggered if certain nodes are pruned and alloc reuse them in the future tree builds. I think it is still an useful feature to have here.
Given that this is a major data structure change, can you please send an RFC discussion on this, thanks! |
738b69a
to
690ce0c
Compare
@tqchen thanks your comments!
This is not used anywhere in the code so it causes some confusion for me (and I assume others) as to how it should be used or what the future use might be. I also suspect if we try to begin using it a large amount of code wouldn't work because it is built on the assumption of a tree having a single root. I prefer not to have code lying around without clear function or use cases, it makes it very difficult to understand and reorganise code.
Again, this is not used anywhere in the code base. I have no idea what its purpose is, can we implement this according to our needs in future instead of keeping it redundantly?
There is no change in end user functionality or data structure here (the internal node vector and stats vector remains unchanged) so an RFC seems like a lot of work. Serialisation remains the same. I have only renamed and removed code that was not being used. |
See these two references identifying the above problems as code smells: |
690ce0c
to
66be465
Compare
// allocate a new node, | ||
// !!!!!! NOTE: may cause BUG here, nodes.resize | ||
inline int AllocNode() { | ||
if (param.num_deleted != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only be triggered if certain nodes are pruned and alloc reuse them in the future tree builds. I think it is still an useful feature to have here.
I am fine removing the leaf_vectors, but let us keep the root index as they will not hurt the flow as much, and a generally growing n-th layer of the tree already did that. |
While I feel a bit attached to having the vector leaves available (I tried some hacking attempts at vector trees a while ago), I've also realized that there would be quite more work needed in order to properly utilize them. So it would make sense to not lug them around while not in use. But as for the renaming the RegTree and switching from the [nid] overloading to a getter function, I really don't see any tangible benefit to these changes which would justify touching the hundreds of lines of code. |
@RAMitchell When it's done, please take some notes in #3986 , these things started to become a headache. |
@trivialfis there will be no modification of user facing parameters in this PR. The parameters that are changed and are only used internally and are not documented. @khotilov These changes update the code with google code style (descriptive unabbreviated naming). I do agree that their impact is small but renaming is also fairly low risk. There is probably far too much going on in this PR so I will consider separating out the key elements into several PRs. It has stimulated a nice discussion anyway :) |
One thing I noticed while working on this, MetaInfo.root_index_ seems to be completely redundant.