Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[optional] support loading and saving models with protobuf #908

Merged
merged 1 commit into from
Oct 19, 2017

Conversation

wxchan
Copy link
Contributor

@wxchan wxchan commented Sep 11, 2017

related to #372

  • solve conflicts
  • test windows support: mingw can work, vs not tested yet

@guolinke
Copy link
Collaborator

@wxchan
the license of protobuf? and is it cross platform ?

@wxchan
Copy link
Contributor Author

wxchan commented Sep 12, 2017

@guolinke

LICENSE: https://github.com/google/protobuf/blob/master/LICENSE

I add a test and it passes on both linux and osx, I am not sure about windows.

test on osx: https://travis-ci.org/wxchan/LightGBM/jobs/274043104, it takes too long.

@guolinke
Copy link
Collaborator

guolinke commented Sep 12, 2017

@wxchan what is the compression ratio of using protobuf, compared with raw text format ?

I think using protobuf is too heavy, can we just use a simple binary format (like the dataset bin file) ?

@wxchan
Copy link
Contributor Author

wxchan commented Sep 12, 2017

@guolinke I didn't test on big model yet. My computer takes too long to load big datasets. The good thing of protobuf is, it's readable through .proto file. We can easily know what features are saved.

@wxchan
Copy link
Contributor Author

wxchan commented Sep 13, 2017

@guolinke I made a test on Higgs dataset

text ->proto
after 500 rounds
save: 0.612300s -> 0.023627s
load: 0.471999s -> 0.024712s
size: 13M -> 5.9M

@wxchan wxchan changed the title [WIP] [optional] support loading and saving models with protobuf [optional] support loading and saving models with protobuf Sep 13, 2017
.travis.yml Outdated
- os: osx
env: TASK=proto
- os: osx
env: TASK=regular PYTHON_VERSION=2.7
Copy link
Contributor

@henry0312 henry0312 Sep 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please check #904 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henry0312 I saw. Do you have an agreement yet? I can add it later, I think it's a waste of resource to run those python tests I didn't alter multiple time now.

Copy link
Contributor

@henry0312 henry0312 Sep 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an agreement yet?

No, I don't know if @guolinke agrees with it yet, but I think we should run tests with Python 2.7.x and Python 3.6.x at least.

I think it's a waste of resource to run those python tests I didn't alter multiple time now.

I got it!
Yeah, Travis always faces with lacks of instances of OSX.
We have no choice but to reduce test cases for OSX 😢

@wxchan
Copy link
Contributor Author

wxchan commented Sep 15, 2017

@guolinke
Copy link
Collaborator

@wxchan any updates of this ?
this could be merged. but we still need a cross platform solution.

@guolinke
Copy link
Collaborator

@wxchan it seems that the treelite is like our if-else solution ?

@wxchan
Copy link
Contributor Author

wxchan commented Sep 30, 2017

@hcho3
have you used this proto https://github.com/dmlc/treelite/blob/master/src/tree.proto to compress LightGBM model? what's the compression ratio?

@hcho3
Copy link
Contributor

hcho3 commented Oct 1, 2017

@wxchan No, in treelite, protobuf is only for input not for output. (Protobuf is one of four methods to input a model into treelite compiler.) So LightGBM model isn't being translated into protobuf.

Not sure how effective protobuf is for compression.

@Tony-Y
Copy link
Contributor

Tony-Y commented Oct 9, 2017

I found useful information: https://github.com/thekvs/cpp-serializers
I think cereal is better than protobuf.

@wxchan
Copy link
Contributor Author

wxchan commented Oct 19, 2017

@guolinke I already finished this thread.

@guolinke guolinke merged commit 53b9985 into microsoft:master Oct 19, 2017
@wxchan wxchan deleted the proto branch October 19, 2017 07:07
@guolinke
Copy link
Collaborator

@wxchan this PR will break the build of VS. I will revert it and merge back after fixing this.

guolinke added a commit that referenced this pull request Oct 20, 2017
guolinke added a commit that referenced this pull request Oct 20, 2017
@wxchan
Copy link
Contributor Author

wxchan commented Oct 20, 2017

I think I forgot changing some path of file in windows/LightGBM.vcxproj after moving it, is this the reason?

@guolinke
Copy link
Collaborator

@wxchan it cannot find some ".h" and ".cpp" files, seems are some the auto-generated protobuf files.

@wxchan
Copy link
Contributor Author

wxchan commented Oct 20, 2017

@guolinke
Copy link
Collaborator

@wxchan
yeah, it says cannot find model.pb.h ... .
Maybe forget to add it to these include path ? https://github.com/Microsoft/LightGBM/blob/master/windows/LightGBM.vcxproj#L63

@wxchan wxchan restored the proto branch October 21, 2017 05:05
@wxchan
Copy link
Contributor Author

wxchan commented Oct 21, 2017

@guolinke yes, I forgot adding it. btw, there are some files for when proto=on, some files for when proto=off, they are controlled by cmake, should I add them all to windows/LightGBM.vcxproj?

@guolinke
Copy link
Collaborator

@wxchan it seems VS cannot support proto, so we can just disable it

@chivee chivee mentioned this pull request Nov 4, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants