Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix model loading from stream #7067

Merged
merged 1 commit into from
Aug 15, 2021
Merged

Conversation

mpetricek-corp
Copy link
Contributor

Fix bug introduced in 1791371 (allow loading from byte array)

When loading model from stream, only last buffer read from the input stream is used to construct the model.

This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.

Fix bug introduced in 1791371 (allow loading from byte array)

When loading model from stream, only last buffer read from the input stream is used to construct the model.

This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.
@trivialfis
Copy link
Member

Thanks for the PR! Is there a way we can add a test for this?

@mpetricek-corp
Copy link
Contributor Author

Is there a way we can add a test for this?

Perhaps add a test that creates 1.5MB large model definition and tries to load it? I can try creating such test case, perhaps by crafting some minimal model and (as it is a json) put some generated whitespace inside to make it 1.5 MB large (and it will fail to parse if only first/last 1 MB is seen), then send it via a stream.

@wbo4958
Copy link
Contributor

wbo4958 commented Jun 30, 2021

@mpetricek-corp Better to add a unit test for your fix. Thx very much

@trivialfis
Copy link
Member

Can we generate the model with random data. To get a large model, easiest way is to use classification with random forest.

@trivialfis
Copy link
Member

Close #7168 .

@trivialfis trivialfis merged commit 46c4682 into dmlc:master Aug 15, 2021
@trivialfis
Copy link
Member

Merged since it's difficult to create unittest and is confirmed to work.

@mpetricek-corp mpetricek-corp deleted the patch-1 branch August 16, 2021 15:58
NvTimLiu pushed a commit to NvTimLiu/spark-xgboost that referenced this pull request Nov 1, 2021
Fix bug introduced in 1791371 (allow loading from byte array)

When loading model from stream, only last buffer read from the input stream is used to construct the model.

This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.
NvTimLiu added a commit to NvTimLiu/spark-xgboost that referenced this pull request Nov 1, 2021
Fix model loading from stream (dmlc#7067)

See merge request nvspark/xgboost!391
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants