Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] Deprecate constructors with implicit missing value. #7225

Merged
merged 3 commits into from
Sep 16, 2021

Conversation

trivialfis
Copy link
Member

During debugging #7210 , I found the java implementation has a constructor the uses 0 as the default value for missing, which doesn't make sense since 0 is a meaningful input for XGBoost. NaN would be a better choice and is in fact used by both Python and R. However, I don't think it's a good idea for us to swap the missing value silently. Therefore, I propose that we deprecate those constructors and let users specify missing explicitly.

@trivialfis
Copy link
Member Author

@wbo4958 @CodingCat @hcho3

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit annoying to have to specify it when you have dense data. I would argue it should just be NaN like everywhere else, but defer to jvm users.

@trivialfis
Copy link
Member Author

Hey @Craigacp , could you please help take a look?

@Craigacp
Copy link
Contributor

Looks fine to me. You could modify the javadoc in DMatrix.java to say Please specify the missing value explicitly using {@link DMatrix(float[], int, int, float) which will insert a link to the correct constructor.

I've only ever used the sparse DMatrix constructors though, so I'm not sure how changing the default will affect users.

@trivialfis
Copy link
Member Author

@Craigacp Thanks for the suggestion!

@RAMitchell I'm worried that changing the missing value silently can bring some unindented results for users. Debugging it isn't easy considering users cannot see what's inside the DMatrix.

@codecov-commenter
Copy link

codecov-commenter commented Sep 15, 2021

Codecov Report

Merging #7225 (38ab163) into master (3515931) will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #7225   +/-   ##
=======================================
  Coverage   82.63%   82.64%           
=======================================
  Files          13       13           
  Lines        4019     4021    +2     
=======================================
+ Hits         3321     3323    +2     
  Misses        698      698           
Impacted Files Coverage Δ
python-package/xgboost/sklearn.py 90.87% <0.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3515931...38ab163. Read the comment docs.

@wbo4958
Copy link
Contributor

wbo4958 commented Sep 16, 2021

LGTM,

*/
@Deprecated
public DMatrix(float[] data, int nrow, int ncol) throws XGBoostError {
long[] out = new long[1];
XGBoostJNI.checkCall(XGBoostJNI.XGDMatrixCreateFromMat(data, nrow, ncol, 0.0f, out));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change the default missing value to Float.Nan here instead of change everywhere

@trivialfis trivialfis merged commit 9f63d6f into dmlc:master Sep 16, 2021
@trivialfis trivialfis deleted the remove-dft-missing branch September 16, 2021 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants