Fix model loading from stream #7067

mpetricek-corp · 2021-06-29T14:35:17Z

Fix bug introduced in 1791371 (allow loading from byte array)

When loading model from stream, only last buffer read from the input stream is used to construct the model.

This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.

Fix bug introduced in 1791371 (allow loading from byte array) When loading model from stream, only last buffer read from the input stream is used to construct the model. This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.

trivialfis · 2021-06-29T16:35:39Z

Thanks for the PR! Is there a way we can add a test for this?

mpetricek-corp · 2021-06-29T17:01:02Z

Is there a way we can add a test for this?

Perhaps add a test that creates 1.5MB large model definition and tries to load it? I can try creating such test case, perhaps by crafting some minimal model and (as it is a json) put some generated whitespace inside to make it 1.5 MB large (and it will fail to parse if only first/last 1 MB is seen), then send it via a stream.

wbo4958 · 2021-06-30T09:16:09Z

@mpetricek-corp Better to add a unit test for your fix. Thx very much

trivialfis · 2021-06-30T16:44:19Z

Can we generate the model with random data. To get a large model, easiest way is to use classification with random forest.

trivialfis · 2021-08-12T08:41:22Z

Close #7168 .

trivialfis · 2021-08-15T13:05:03Z

Merged since it's difficult to create unittest and is confirmed to work.

Fix bug introduced in 1791371 (allow loading from byte array) When loading model from stream, only last buffer read from the input stream is used to construct the model. This may work for models smaller than 1 MiB (if you are lucky enough to read the whole model at once), but will always fail if the model is larger.

Fix model loading from stream (dmlc#7067) See merge request nvspark/xgboost!391

trivialfis added the Blocking label Jul 7, 2021

trivialfis mentioned this pull request Aug 11, 2021

XGBoost4J: Exporting to, then importing a model from stream results in wrong model format exception #7168

Closed

trivialfis merged commit 46c4682 into dmlc:master Aug 15, 2021

mpetricek-corp deleted the patch-1 branch August 16, 2021 15:58

wbo4958 mentioned this pull request Aug 19, 2021

XGBoost 4J spark giving XGBoostError: std::bad_alloc on databricks #7155

Open

hcho3 mentioned this pull request Aug 28, 2021

[xgboost4j] large model cause bad_alloc error #7200

Closed

NvTimLiu added a commit to NvTimLiu/spark-xgboost that referenced this pull request Nov 1, 2021

Merge branch 'cherry-pick-7067' into 'nv-release-1.4.0'

c237912

Fix model loading from stream (dmlc#7067) See merge request nvspark/xgboost!391

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix model loading from stream #7067

Fix model loading from stream #7067

mpetricek-corp commented Jun 29, 2021

trivialfis commented Jun 29, 2021

mpetricek-corp commented Jun 29, 2021

wbo4958 commented Jun 30, 2021

trivialfis commented Jun 30, 2021

trivialfis commented Aug 12, 2021

trivialfis commented Aug 15, 2021

Fix model loading from stream #7067

Fix model loading from stream #7067

Conversation

mpetricek-corp commented Jun 29, 2021

trivialfis commented Jun 29, 2021

mpetricek-corp commented Jun 29, 2021

wbo4958 commented Jun 30, 2021

trivialfis commented Jun 30, 2021

trivialfis commented Aug 12, 2021

trivialfis commented Aug 15, 2021