-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Breaking] Add global versioning. #4936
Conversation
* Generate c and Python version file with CMake. The generated file is written into source tree. But unless XGBoost upgrades its version, there will be no actual modification. This retains compatibility with Makefiles for R. * Add XGBoost version the DMatrix binaries.
Hi @trams . This is a small portion of JSON PR with cleaned up code for handling version. It will also benefit your c++ code base mentioned in #4895 (comment) . Would you like to take a look? |
Codecov Report
@@ Coverage Diff @@
## master #4936 +/- ##
==========================================
- Coverage 71.27% 71.05% -0.22%
==========================================
Files 11 11
Lines 2294 2301 +7
==========================================
Hits 1635 1635
- Misses 659 666 +7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this PR breaks compatibility for DMatrix serialisation? We might have to have some wider discussions about how this transition is going to occur. I assume you will gradually phase out the old serialisation for classes in parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I would just fix the issue with binary file header ("version: ")
@hcho3 It's a breaking change. Could you please take a look when time allows? This uses XGBoost version for DMatrix instead of its own version. I believe the breakage is a net gain in the future. WDYT? |
@trivialfis I will review this as time allows. One request: Since we are breaking backward compatibility, can we modify MetaInfo to add cc @RAMitchell |
@hcho3 Can we have something more concrete than extra ? |
It shouldn't be too difficult to add new fields in the future. After all we are just appending to the end of binary, newer version can try reading at the end see if the new field is there. We only break things when removing something, and adding stuffs in the middle. |
Earlier I had planned to add `label_lower_bound` and `label_upper_bound`
fields, but @RAMitchell did not like the idea. He proposed the key-value
storage for all extra 1D data, so that we don’t have to keep adding new
fields in the future. See
#4763 (comment) for more
context.
…On Thu, Oct 17, 2019 at 10:08 AM Jiaming Yuan ***@***.***> wrote:
It shouldn't be too difficult to add new fields in the future. After all
we are just appending to the end of binary, newer version can try reading
at the end see if the new field is there. We only break things when
removing something, and adding stuffs in the middle.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4936?email_source=notifications&email_token=AATKM5I7YNTHF3VP76JKDTDQO7JLTA5CNFSM4JACN2L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOS7EI#issuecomment-542977937>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATKM5PYD4DFGB6WMJIKERTQO7JLTANCNFSM4JACN2LQ>
.
|
I see. I fear the IO for DMatrix will end up whatever like the current booster IO. Can we delay adding your extra info? I will sort something out before the next release. Like having a formated binary with field name and offsets. (Formatting is good). And we don't promise compatibility between commits right?
|
Got it. Then for now I’ll review this PR as it is.
…On Thu, Oct 17, 2019 at 11:18 AM Jiaming Yuan ***@***.***> wrote:
@hcho3 <https://github.com/hcho3> Added an issue #4956
<#4956>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4936?email_source=notifications&email_token=AATKM5KP2YKT3GZ5VK4IHGLQO7RQZA5CNFSM4JACN2L2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBOWTFI#issuecomment-542992789>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATKM5NV6QUP5PT6ZTKJYQDQO7RQZANCNFSM4JACN2LQ>
.
|
@trivialfis I simplified the logic for detecting prefetch intrinsics. Can you take a look and see if the change is reasonable? |
Looks nice. Thanks for cleaning it up. |
@trivialfis I'll submit a follow-up PR to implement new binary DMatrix format, after this PR is merged. |
The generated file is written into source tree. But unless XGBoost updates
its version, there will be no actual modification. This retains compatibility
with Makefiles for R.
Small part of #4732 , tidied up.