Support specifying number of iterations in dataset evaluation #4210

wangmn93 · 2021-04-21T09:39:14Z

I want to evaluate a constructed dataset using only one tree. booster.predict use the raw data and it is slow. Is there any way to do this?

import numpy as np
from lightgbm import train, Booster

data = np.random(0,1,(100,10))
label = np.random(0,1,100)
dataset = Dataset(data=data, label=label)
booster = train(dataset, num_boost_round=10)
tree0 = Booster(model_string=booster.model_to_string(start_iteration=0, num_iteration=1))

tree0.predict(data) # this is slow

tree0 = Booster(model_str=booster.model_to_string(start_iteration=0, num_iteration=1), train_set=dataset)

tree0._Booster__inner_predict(0) # it always output zeros, since the init score is zero

shiyu1994 · 2021-04-22T03:57:48Z

@wangmn93 Thanks for using LightGBM. If you want to evaluate a constructed model, you have to make it as either a training data or validation data, and use it in the training process.
Once the model is trained, currently we don't have any support to use the trained model to evaluate a constructed Dataset.
Also, I don't think prediction with a constructed Dataset (if we can do this) should be faster than with raw data. How slow it is to predict with raw data in your case?

wangmn93 · 2021-04-22T07:35:45Z

@shiyu1994 I have a matrix of 1000000 x 1000, it takes 8.7ms using constructed dataset and 6.85s using booster.predict. Besides the prediction time, it is preferred to use constructed dataset since the raw data is big and loading data takes a lot of time.

StrikerRUS · 2021-04-22T20:56:17Z

Once the model is trained, currently we don't have any support to use the trained model to evaluate a constructed Dataset.

I think eval(_train/_valid) functions can be used to score a constructed Dataset.
#3949 (comment)
https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html#lightgbm.Booster.eval

StrikerRUS · 2021-04-28T13:08:17Z

@shiyu1994 Do you think we can add a feature request for including start_iteration and num_iteration (just like in the predict() method) into eval(_train/_valid) methods?

shiyu1994 · 2021-04-29T02:38:01Z

@StrikerRUS Sure. That will be valuable.

StrikerRUS · 2021-04-29T21:41:11Z

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute to this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

StrikerRUS changed the title ~~how to evaluate dataset using one tree?~~ Support specifying number of iterations in dataset evaluation Apr 29, 2021

StrikerRUS mentioned this issue Apr 29, 2021

Feature Requests & Voting Hub #2302

Open

StrikerRUS closed this as completed Apr 29, 2021

StrikerRUS added feature request help wanted labels Apr 29, 2021

jameslamb mentioned this issue May 11, 2021

[R-package] predict() breaks when using a Dataset stored in a file #4034

Closed

jameslamb mentioned this issue Aug 22, 2021

Enable use of constructed Dataset in predict() methods #4546

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support specifying number of iterations in dataset evaluation #4210

Support specifying number of iterations in dataset evaluation #4210

wangmn93 commented Apr 21, 2021 •

edited

Loading

shiyu1994 commented Apr 22, 2021

wangmn93 commented Apr 22, 2021

StrikerRUS commented Apr 22, 2021

StrikerRUS commented Apr 28, 2021

shiyu1994 commented Apr 29, 2021

StrikerRUS commented Apr 29, 2021

Support specifying number of iterations in dataset evaluation #4210

Support specifying number of iterations in dataset evaluation #4210

Comments

wangmn93 commented Apr 21, 2021 • edited Loading

shiyu1994 commented Apr 22, 2021

wangmn93 commented Apr 22, 2021

StrikerRUS commented Apr 22, 2021

StrikerRUS commented Apr 28, 2021

shiyu1994 commented Apr 29, 2021

StrikerRUS commented Apr 29, 2021

wangmn93 commented Apr 21, 2021 •

edited

Loading