-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Model Results with Identical Dataset and Parameters #6604
Comments
Thanks for using LightGBM. There are significant details you've omitted, which we'd need to help investigate whether what you're seeing is a bug or expected behavior. For example:
Investigating these sorts of reports of the form "I expected these to be identical and they're different" requires a lot of work eliminating possible sources of difference. There is lots of prior discussion on that topic in this issue tracker: See for example:
We need to add some documentation explaining all of these topics (#6094), but until then.... unless you're willing to work with us and provide a reproducible example with these types of details, it's unlikely you'll find the answers you're looking for here. |
Hi, thanks for your reply. I'm running the exact same fitting process sequentially on the same machine, using identical raw data. My operating system is RHEL 8.x, and I'm using lgb.LGBMRegressor. Unfortunately, I can't share my source code or raw data due to confidentiality. However I find that my problem is closely related to #559. i tried to use device_type="cpu" which is perfectly fine, i am training using double precision as mentioned in above issue to see how things goes. |
Thanks. Can you at least share the full set of parameters being passed into |
Of course, all none default params : |
Thanks for that. Please try the following changes to the parameters:
There are some forms of non-deterministic results using the GPU version that are unavoidable. But hopefully those changes will reduce most of the difference, and hopefully you'll see very similar models and performance metrics from multiple repeated training runs. |
Hi, thanks for your advice. It takes some time to conduct all experiments. My observation for my case is that to make the results consistent we need to set |
Great, thanks for checking! Yes this all sounds right to me. I'd forgotten about setting Lines 54 to 59 in d67ecf9
And it sounds right that enabling that might lead to slower training... that is why it's I'm going to close this as resolved. As you mentioned above, "the differences are minor" ... that is expected. You will have to choose what matters more for your use case... training speed or perfect determinism across multiple runs. Hopefully you're able to use the faster, not-quite-deterministic settings and trust your evaluation setup to help choose between models for tasks like hyperparameter tuning. |
Description
I'm encountering an issue where I obtain different model results each time I train an LGBM model, even though I'm using the exact same dataset and identical parameters. This behavior is unexpected, as I would assume that using a fixed seed, dataset, and hyperparameters should produce consistent results though the differnecees are minor. I wonder if this is normal?
Environment info
LightGBM version: V4.5.0
Command(s) you used to install LightGBM
Additional Comments
I am using device_type = "gpu" along with common parameters.
The text was updated successfully, but these errors were encountered: