Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update document for model dump. #5818

Merged
merged 2 commits into from
Jun 22, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 20 additions & 23 deletions doc/tutorials/saving_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ configuration directly as a JSON string. In Python package:
print(config)


or
or in R:

.. code-block:: R

Expand Down Expand Up @@ -158,22 +158,9 @@ Will print out something similiar to (not actual output as it's too long for dem
"colsample_bynode": "1",
"colsample_bytree": "1",
"default_direction": "learn",
"enable_feature_grouping": "0",
"eta": "0.300000012",
"gamma": "0",
"grow_policy": "depthwise",
"interaction_constraints": "",
"lambda": "1",
"learning_rate": "0.300000012",
"max_bin": "256",
"max_conflict_rate": "0",
"max_delta_step": "0",
"max_depth": "6",
"max_leaves": "0",
"max_search_group": "100",
"refresh_leaf": "1",
"sketch_eps": "0.0299999993",
"sketch_ratio": "2",

...

"subsample": "1"
}
}
Expand Down Expand Up @@ -207,13 +194,16 @@ This way users can study the internal representation more closely. Please note
JSON generators make use of locale dependent floating point serialization methods, which
is not supported by XGBoost.

************
Future Plans
************
*************************************************
Difference between saving model and dumping model
*************************************************

Right now using the JSON format incurs longer serialisation time, we have been working on
optimizing the JSON implementation to close the gap between binary format and JSON format.
You can track the progress in `#5046 <https://github.com/dmlc/xgboost/pull/5046>`_.
XGBoost has a function called ``dump_model`` in Booster object, which lets you to export
the model in a readable format like ``text``, ``json`` or ``dot`` (graphviz). The primary
use case for it is for model interpretation or visualization, and is not supposed to be
loaded back to XGBoost. The JSON version has a `schema
<https://github.com/dmlc/xgboost/blob/master/doc/dump.schema>`_. See next section for
more info.

***********
JSON Schema
Expand All @@ -229,3 +219,10 @@ leaf directly, instead it saves the weights as a separated array.

.. include:: ../model.schema
:code: json

************
Future Plans
************

Right now using the JSON format incurs longer serialisation time, we have been working on
optimizing the JSON implementation to close the gap between binary format and JSON format.
25 changes: 18 additions & 7 deletions python-package/xgboost/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -1444,8 +1444,11 @@ def save_model(self, fname):

The model is saved in an XGBoost internal format which is universal
among the various XGBoost interfaces. Auxiliary attributes of the
Python Booster object (such as feature_names) will not be saved. To
preserve all attributes, pickle the Booster object.
Python Booster object (such as feature_names) will not be saved. See:

https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html

for more info.

Parameters
----------
Expand All @@ -1460,7 +1463,7 @@ def save_model(self, fname):
raise TypeError("fname must be a string or os_PathLike")

def save_raw(self):
"""Save the model to a in memory buffer representation
"""Save the model to a in memory buffer representation instead of file.

Returns
-------
Expand All @@ -1479,8 +1482,11 @@ def load_model(self, fname):

The model is loaded from an XGBoost format which is universal among the
various XGBoost interfaces. Auxiliary attributes of the Python Booster
object (such as feature_names) will not be loaded. To preserve all
attributes, pickle the Booster object.
object (such as feature_names) will not be loaded. See:

https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html

for more info.

Parameters
----------
Expand All @@ -1503,7 +1509,9 @@ def load_model(self, fname):
raise TypeError('Unknown file type: ', fname)

def dump_model(self, fout, fmap='', with_stats=False, dump_format="text"):
"""Dump model into a text or JSON file.
"""Dump model into a text or JSON file. Unlike `save_model`, the
output format is primarily used for visualization or interpretation,
hence it's more human readable but cannot be loaded back to XGBoost.

Parameters
----------
Expand Down Expand Up @@ -1537,7 +1545,9 @@ def dump_model(self, fout, fmap='', with_stats=False, dump_format="text"):
fout.close()

def get_dump(self, fmap='', with_stats=False, dump_format="text"):
"""Returns the model dump as a list of strings.
"""Returns the model dump as a list of strings. Unlike `save_model`, the
output format is primarily used for visualization or interpretation,
hence it's more human readable but cannot be loaded back to XGBoost.

Parameters
----------
Expand All @@ -1547,6 +1557,7 @@ def get_dump(self, fmap='', with_stats=False, dump_format="text"):
Controls whether the split statistics are output.
dump_format : string, optional
Format of model dump. Can be 'text', 'json' or 'dot'.

"""
fmap = os_fspath(fmap)
length = c_bst_ulong()
Expand Down