-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make `HistCutMatrix::Init' be aware of groups. #4115
Conversation
127f901
to
d4b10f5
Compare
@trivialfis Yes, to clarify, weights in ranking objects will have to be per-group basis. Does this PR make this assumption? For the quantile sketch, we can replicate the group weight to cover all data points in each group. |
@hcho3 This PR accepts both per-group and per-instance until the last commit There was a test in jvm-package scala that inputs different number of weights and groups: https://travis-ci.org/dmlc/xgboost/jobs/490537869#L1636 Please ignore the incorrect check, it's fixed in later commits. But the showed error message seems to support per-instance weighting. |
@hcho3 Ha, there it's: So it's a buggy test and I should remove/fix it? |
@trivialfis Yes, and we'll need to be clear about the use of per-group weights. It really doesn't make sense to give different weights to data rows in the same query session, since only the relative ordering of the data rows are important (hence learning-to-rank). |
@hcho3 Thanks for the explanation. That goes in my reading list. |
@trivialfis A good place to start is to learn what the ranking metric (e.g. NDCG) does. This reading helped me: http://dalelane.co.uk/blog/?p=3403 |
@hcho3 Thanks for the link! That's useful. |
b2d766a
to
40c086f
Compare
@trivialfis Okay, I'll review. |
@trivialfis Can you look at my latest revision and see if it's good? |
Oops, I think it's wrong. Let me try again |
@hcho3 Do we need to calculate different |
I can try it again tomorrow unless you beat me to it. :) |
@trivialfis Notice that parallelization happens over columns, not rows |
@hcho3 I see. Sorry for the mistake. |
@trivialfis No problem. I should probably add another test case with multiple batches. |
@hcho3 That would be great. I wanted to add some tests for histogram, but it's quite weird since quantile is not well tested. I don't know how to derive the solution for tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change these tomorrow, but the might not be able to add the desired tests myself.
@trivialfis It turns out that currently you cannot enable external memory for fast quantile updater. See #4093. I've added a test anyway, so that we can later add external memory support to fast quantile. |
@trivialfis Looks like I uncovered a bug in external memory. I'll remove the failing test and file an new issue. |
@hcho3 It bugs me to add such an overhead. In the context of using external memory, does the data come in as original order? If so we can avoid adding such search completely. |
@hcho3 I will keep looking at the external memory part later. But if you know the answer then please let me know. The data loading logic is quite confusing to me. Many thanks! |
@hcho3 I added the monitor for profiling anyway. I don't think the data will come in as original order. And since the search performs in O(log n) with log(2^32) == 32, I'm gonna ignore it for now. Sorry for the noise and ready for another review. :) |
@trivialfis For now, it is safe to assume that the external pages are out-of-order. We'll have to dive deeper into external memory when we attempt to fix #4130. |
@hcho3 Could you help taking another look? I don't think this PR has anything to do with failed R test. |
* Add checks for group size. * Simple docs. * Revise logic for locating group ID of first row of each batch Co-authored-by: Jiaming Yuan <jm.yuan@outlook.com> Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
2d2c53a
to
a181918
Compare
@hcho3 Mac tests in Travis is not looking good. Is there anything we can do about it? |
@trivialfis This time it’s a bug in Travis CI. Will file an issue today. |
@trivialfis Eventually, I hope we find a better alternative for Mac builds, but unfortunately, we cannot get Mac workers with Jenkins since Amazon EC2 doesn't support it. |
Filed a support ticket just now. Hope they get back to me soon. |
@trivialfis Wow, they got back to me and fixed the issue right away. Builds are working now. |
@hcho3 That's really nice! |
Address segfault in #4098, error in quantile is not resolved.