Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect prediction logic, allows buffer overflow #7903

Closed
drorspei opened this issue May 14, 2022 · 0 comments · Fixed by #7918
Closed

Incorrect prediction logic, allows buffer overflow #7903

drorspei opened this issue May 14, 2022 · 0 comments · Fixed by #7918

Comments

@drorspei
Copy link

drorspei commented May 14, 2022

Hey hey, thanks for this amazing package.
There is a small issue arising from an inconsistency between the prediction logic and loading models logic:
When loading a model from json we can specify the indices of the left and right children of each node in each tree however we want.
But the prediction logic in predictor/predict_fn.h:GetNextNode assumes that the right child of a node is always exactly one after the left child of the node. (given that the current feature isn't categorical).
An example is given below.
Since there is access to the node that is one after the left child, this can be a buffer overflow if there is no such node, which can cause a segmentation fault.
The easiest solution would be to update a single line in predictor/predict_fn.h so that it does the correct thing, without making assumptions about the tree structure - since they are not enforced anywhere else.
I've put up a pull request here: #7902

The following is an example in python. We load a model with a single tree with 3 nodes, but order them as root, right child, left child.
The leaf value of the left node is 2, but the result is either 1 or a segfault.

import xgboost
import json
import pandas as pd

xgboost.Booster(model_file=bytearray(json.dumps({
    'learner': {
        'attributes': {
            'best_iteration': '0',
            'best_ntree_limit': '1'
        },
        'feature_names': ['0'],
        'feature_types': ['float'],
        'gradient_booster': {
            'model': {
                'gbtree_model_param': {
                    'num_parallel_tree': '1',
                    'num_trees': '1',
                    'size_leaf_vector': '0'
                },
                'tree_info': [0],
                'trees': [
                    {
                        'categories': [],
                        'categories_nodes': [],
                        'categories_segments': [],
                        'categories_sizes': [],
                        'id': 0,
                        'tree_param': {
                            'num_deleted': '0',
                            'num_feature': '1',
                            'num_nodes': '3',
                            'size_leaf_vector': '0'
                        },
                        'split_type': [0, 0, 0],
                        'split_indices': [0, 0, 0],
                        'default_left': [0, 0, 0],
                        'split_conditions': [0., 1., 2.],
                        'loss_changes': [0., 0., 0., ],
                        'parents': [2147483647, 0, 0],
                        'left_children': [2, -1, -1],
                        'right_children': [1, -1, -1],
                        'sum_hessian': [0.0, 0.0, 0.0],
                        'base_weights': [0., 0., 0.]}
                ]
            },
            'name': 'gbtree'
        },
        'learner_model_param': {
            'base_score': '0',
            'num_class': '0',
            'num_feature': '1',
            'num_target': '1'
        },
        'objective': {
            'name': 'reg:squarederror',
            'reg_loss_param': {'scale_pos_weight': '1'}
        }
    },
    'version': [1, 6, 1]
}).encode())).predict(xgboost.DMatrix(pd.DataFrame([[1.]], columns=['0'])))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant