-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix model loading from stream #7067
Conversation
Fix bug introduced in 1791371 (allow loading from byte array) When loading model from stream, only last buffer read from the input stream is used to construct the model. This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.
Thanks for the PR! Is there a way we can add a test for this? |
Perhaps add a test that creates 1.5MB large model definition and tries to load it? I can try creating such test case, perhaps by crafting some minimal model and (as it is a json) put some generated whitespace inside to make it 1.5 MB large (and it will fail to parse if only first/last 1 MB is seen), then send it via a stream. |
@mpetricek-corp Better to add a unit test for your fix. Thx very much |
Can we generate the model with random data. To get a large model, easiest way is to use classification with random forest. |
Close #7168 . |
Merged since it's difficult to create unittest and is confirmed to work. |
Fix bug introduced in 1791371 (allow loading from byte array) When loading model from stream, only last buffer read from the input stream is used to construct the model. This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.
Fix model loading from stream (dmlc#7067) See merge request nvspark/xgboost!391
Fix bug introduced in 1791371 (allow loading from byte array)
When loading model from stream, only last buffer read from the input stream is used to construct the model.
This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.