Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support specifying number of iterations in dataset evaluation #4210

Closed
wangmn93 opened this issue Apr 21, 2021 · 6 comments
Closed

Support specifying number of iterations in dataset evaluation #4210

wangmn93 opened this issue Apr 21, 2021 · 6 comments

Comments

@wangmn93
Copy link

wangmn93 commented Apr 21, 2021

I want to evaluate a constructed dataset using only one tree. booster.predict use the raw data and it is slow. Is there any way to do this?

import numpy as np
from lightgbm import train, Booster

data = np.random(0,1,(100,10))
label = np.random(0,1,100)
dataset = Dataset(data=data, label=label)
booster = train(dataset, num_boost_round=10)
tree0 = Booster(model_string=booster.model_to_string(start_iteration=0, num_iteration=1))

tree0.predict(data) # this is slow

tree0 = Booster(model_str=booster.model_to_string(start_iteration=0, num_iteration=1), train_set=dataset)

tree0._Booster__inner_predict(0) # it always output zeros, since the init score is zero

@shiyu1994
Copy link
Collaborator

@wangmn93 Thanks for using LightGBM. If you want to evaluate a constructed model, you have to make it as either a training data or validation data, and use it in the training process.
Once the model is trained, currently we don't have any support to use the trained model to evaluate a constructed Dataset.
Also, I don't think prediction with a constructed Dataset (if we can do this) should be faster than with raw data. How slow it is to predict with raw data in your case?

@wangmn93
Copy link
Author

@shiyu1994 I have a matrix of 1000000 x 1000, it takes 8.7ms using constructed dataset and 6.85s using booster.predict. Besides the prediction time, it is preferred to use constructed dataset since the raw data is big and loading data takes a lot of time.

@StrikerRUS
Copy link
Collaborator

Once the model is trained, currently we don't have any support to use the trained model to evaluate a constructed Dataset.

I think eval(_train/_valid) functions can be used to score a constructed Dataset.
#3949 (comment)
https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html#lightgbm.Booster.eval

@StrikerRUS
Copy link
Collaborator

@shiyu1994 Do you think we can add a feature request for including start_iteration and num_iteration (just like in the predict() method) into eval(_train/_valid) methods?

@shiyu1994
Copy link
Collaborator

@StrikerRUS Sure. That will be valuable.

@StrikerRUS StrikerRUS changed the title how to evaluate dataset using one tree? Support specifying number of iterations in dataset evaluation Apr 29, 2021
@StrikerRUS
Copy link
Collaborator

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute to this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants