Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python gbm.feature_importance() error? #615

Closed
vousmevoyez opened this issue Jun 12, 2017 · 26 comments
Closed

Python gbm.feature_importance() error? #615

vousmevoyez opened this issue Jun 12, 2017 · 26 comments

Comments

@vousmevoyez
Copy link

vousmevoyez commented Jun 12, 2017

Environment info

Operating System: Linux
CPU:
Python version: Python 2.7.13

Error Message:

ValueError: No JSON object could be decoded

Reproducible examples

lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
params = {
'task':'train',
'boosting':'gbdt',
'objective':'binary',
'metric':{'l2', 'auc'},
'num_leaves': 62,
'learning_rate': 0.05,
'feature_fraction': 0.9,
'bagging_fraction': 0.8,
'bagging_freq': 5,
'verbose': 20
}
gbm = lgb.train(params,
lgb_train,
num_boost_round=250,
valid_sets=lgb_eval)

print('Start predicting...')

y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
y_pred = np.round(y_pred)

print gbm.feature_importance()

@wxchan
Copy link
Contributor

wxchan commented Jun 12, 2017

test both python 2 and 3, no error. try latest code

@vousmevoyez
Copy link
Author

vousmevoyez commented Jun 12, 2017

Still has error. I have tried the latest code. Below is the complete error:

ValueError                                Traceback (most recent call last)
<ipython-input-14-920de1b50449> in <module>()
----> 1 gbm.feature_importance()

/home/admin/anaconda2/lib/python2.7/site-packages/lightgbm-0.2-py2.7.egg/lightgbm/basic.pyc in feature_importance(self, importance_type)
   1662         if importance_type not in ["split", "gain"]:
   1663             raise KeyError("importance_type must be split or gain")
-> 1664         dump_model = self.dump_model()
   1665         ret = [0] * (dump_model["max_feature_idx"] + 1)
   1666 

/home/admin/anaconda2/lib/python2.7/site-packages/lightgbm-0.2-py2.7.egg/lightgbm/basic.pyc in dump_model(self, num_iteration)
   1577                 ctypes.byref(tmp_out_len),
   1578                 ptr_string_buffer))
-> 1579         return json.loads(string_buffer.value.decode())
   1580 
   1581     def predict(self, data, num_iteration=-1, raw_score=False, pred_leaf=False, data_has_header=False, is_reshape=True,

/home/admin/anaconda2/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    337             parse_int is None and parse_float is None and
    338             parse_constant is None and object_pairs_hook is None and not kw):
--> 339         return _default_decoder.decode(s)
    340     if cls is None:
    341         cls = JSONDecoder

/home/admin/anaconda2/lib/python2.7/json/decoder.pyc in decode(self, s, _w)
    362 
    363         """
--> 364         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    365         end = _w(s, end).end()
    366         if end != len(s):

/home/admin/anaconda2/lib/python2.7/json/decoder.pyc in raw_decode(self, s, idx)
    380             obj, end = self.scan_once(s, idx)
    381         except StopIteration:
--> 382             raise ValueError("No JSON object could be decoded")
    383         return obj, end

ValueError: No JSON object could be decoded

@wxchan
Copy link
Contributor

wxchan commented Jun 12, 2017

try set num_boost_round=1 to see if it works.

btw, you should quote your error msg with ```

@vousmevoyez
Copy link
Author

It works. But why does this happen?

@wxchan
Copy link
Contributor

wxchan commented Jun 12, 2017

feature importances use a string buffer passed from c++ to python, I guess the string buffer for 250 rounds is too long and be cut during passing.

@vousmevoyez
Copy link
Author

Sorry. My OS is NOT Windows. It's linux.

@wxchan
Copy link
Contributor

wxchan commented Jun 12, 2017

Oh, sorry, misread it.

@wxchan
Copy link
Contributor

wxchan commented Jun 12, 2017

Strange I set num_boost_round to 1M and still cannot reproduce it. You can change this line to return string_buffer.value.decode(), set num_boost_round to a big number and save gbm.dump_model() to some files, upload here. We can see if it's been cut.

@vousmevoyez
Copy link
Author

vousmevoyez commented Jun 12, 2017

I did what you posted. But I can't gbm.dump_model(). It raises error as below. How about gbm.save_model() as txt format?

<ipython-input-17-cf366c50211c> in <module>()
----> 1 gbm.dump_model()

/home/admin/anaconda2/lib/python2.7/site-packages/lightgbm-0.2-py2.7.egg/lightgbm/basic.pyc in dump_model(self, num_iteration)
   1577                 ctypes.byref(tmp_out_len),
   1578                 ptr_string_buffer))
-> 1579         return string_buffer.value.decode()
   1580 
   1581     def predict(self, data, num_iteration=-1, raw_score=False, pred_leaf=False, data_has_header=False, is_reshape=True,

/home/admin/anaconda2/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    337             parse_int is None and parse_float is None and
    338             parse_constant is None and object_pairs_hook is None and not kw):
--> 339         return _default_decoder.decode(s)
    340     if cls is None:
    341         cls = JSONDecoder

/home/admin/anaconda2/lib/python2.7/json/decoder.pyc in decode(self, s, _w)
    362 
    363         """
--> 364         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    365         end = _w(s, end).end()
    366         if end != len(s):

/home/admin/anaconda2/lib/python2.7/json/decoder.pyc in raw_decode(self, s, idx)
    380             obj, end = self.scan_once(s, idx)
    381         except StopIteration:
--> 382             raise ValueError("No JSON object could be decoded")
    383         return obj, end

ValueError: No JSON object could be decoded

@vousmevoyez
Copy link
Author

FYI, my data has 4.6 million rows and 220 columns.

@wxchan
Copy link
Contributor

wxchan commented Jun 12, 2017

It's strange, json.loads already removed, why still show "No JSON object could be decoded"? I still try reproduce this issue, need some time.

@vousmevoyez
Copy link
Author

vousmevoyez commented Jun 13, 2017

I rerun my code today. Error is different:

TypeError                                 Traceback (most recent call last)
<ipython-input-17-6f3b6c156ac1> in <module>()
----> 1 bst.feature_importance()

/home/admin/anaconda2/lib/python2.7/site-packages/lightgbm-0.2-py2.7.egg/lightgbm/basic.pyc in feature_importance(self, importance_type)
   1663             raise KeyError("importance_type must be split or gain")
   1664         dump_model = self.dump_model()
-> 1665         ret = [0] * (dump_model["max_feature_idx"] + 1)
   1666 
   1667         def dfs(root):

TypeError: string indices must be integers

@wxchan
Copy link
Contributor

wxchan commented Jun 13, 2017

dump_model() seems work, can you try dump_model() again?

@vousmevoyez
Copy link
Author

model.zip

@wxchan
Copy link
Contributor

wxchan commented Jun 13, 2017

Strange that model.json seems not been cut. Try this:

import json
json.loads(gbm.dump_model())

@vousmevoyez
Copy link
Author

ValueError                                Traceback (most recent call last)
<ipython-input-61-7d5d098ecca5> in <module>()
      1 import json
----> 2 json.loads(gbm.dump_model())

/home/admin/anaconda2/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    337             parse_int is None and parse_float is None and
    338             parse_constant is None and object_pairs_hook is None and not kw):
--> 339         return _default_decoder.decode(s)
    340     if cls is None:
    341         cls = JSONDecoder

/home/admin/anaconda2/lib/python2.7/json/decoder.pyc in decode(self, s, _w)
    362 
    363         """
--> 364         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    365         end = _w(s, end).end()
    366         if end != len(s):

/home/admin/anaconda2/lib/python2.7/json/decoder.pyc in raw_decode(self, s, idx)
    380             obj, end = self.scan_once(s, idx)
    381         except StopIteration:
--> 382             raise ValueError("No JSON object could be decoded")
    383         return obj, end

ValueError: No JSON object could be decoded

@wxchan
Copy link
Contributor

wxchan commented Jun 13, 2017

I think I find out the reason. Can you also save_model() and upload here?

@vousmevoyez
Copy link
Author

model_txt.zip

@wxchan
Copy link
Contributor

wxchan commented Jun 13, 2017

Thanks for your help. You can change this line https://github.com/Microsoft/LightGBM/blob/master/src/io/tree.cpp#L369 to str_buf << "\"threshold\":" << Common::AvoidInf(threshold_[index]) << "," << std::endl; for temp solution, and change python-package back. The infinite number cannot be handled by json. I will fix this later.

@guolinke
Copy link
Collaborator

@wxchan
Copy link
Contributor

wxchan commented Jun 13, 2017

@guolinke add Common::AvoidInf to threshold_double? I will keep the change on L369, it helps when user loads old model.

@guolinke
Copy link
Collaborator

@wxchan yes and okay.

@vousmevoyez
Copy link
Author

@wxchan Thanks. it works.

@wxchan
Copy link
Contributor

wxchan commented Jun 13, 2017

@vousmevoyez you can pip install simplejson, it's more efficient and has better error message.

@vousmevoyez
Copy link
Author

OK.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants