-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serializers convert floats to strings #4681
Comments
Thanks for writing this up! Can you please provide some evidence for the problem you're talking about?
Without that type of information, maintainers here will have to put some effort into figuring out things that I suspect you already know. Sharing such information could help us get to a fix more quickly. |
You can confirm it in lines like these: Line 352 in b1b6db4
Alternatively, you can save the model to a file and inspect it in a text editor to see that values are saved in a decimal representation. |
Thanks! We'd need to test this, but do you think this issue could be the cause of #4680? |
Don't know. Could be, if |
got it, thanks for the tip! |
Do you think we need a binary serialization format in addition to current text one? Related: #4217. There was unsuccessful attempt to adopt protobuf format many years ago: #372, #908 (reverted from Also related, our friends at XGBoost are trying to adopt Universal Binary JSON format as a binary serialization format: dmlc/xgboost#7545. |
I think it'd be a good addition, since apart from ruling out any potential case of mismatches between fresh and deserialized models, it would imply decreased memory usage and faster serialization/deserialization (especially important for the R interface since it now keeps a serialized copy by default). However, I think it'd be enough with simply using a custom format by memcpying arrays and struct fields to a |
It might be desirable to have a schema otherwise it can be difficult to debug compatibility issues in the future. |
In the R and C interfaces of LightGBM, the serialization functions convert floating point numbers to text representations in decimal format in order to save and load models (e.g.
LGBM_BoosterSaveModel
andLGBM_BoosterSaveModelToString
). This loses precision in floating point numbers and can lead to small differences in predictions between a model used right after fitting it and a model that was loaded from a saved file or string.As an additional problem, serializing through this system also means that it's not possible to know the size that the serialized bytes will have beforehand (that is, the model needs to be serialized in order to know how long will the buffer than holds need to be), which makes serialization in the C interface less efficient.
The text was updated successfully, but these errors were encountered: