-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threshold must be a real number #322
Threshold must be a real number #322
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Others PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mvargas33!
It would be nice to add a couple of tests to ensure we have the expected behavior
@terrytangyuan @perimosocordiae @wdevazelhes: I can't find the automatic CI / coverage reports - is it me or something is broken there?
From https://travis-ci.org/github/scikit-learn-contrib/metric-learn/builds, it says We should probably switch to use GitHub Actions now. Many projects have already migrated. |
Indeed it seems CI / codecov checks are no longer working (last ones are 4 months ago). Switching to GitHub Actions sounds good - do you happen to know how to go about it? I've never used it. |
It's pretty straightforward to set up. Just create a |
Created #325 to track that separately. Feel free to pick it up. |
metric_learn/base_metric.py
Outdated
@@ -410,6 +410,10 @@ def set_threshold(self, threshold): | |||
""" | |||
check_is_fitted(self, 'preprocessor_') | |||
|
|||
if threshold is None or not isinstance(threshold, (int, float)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to check is None
here, the isinstance check will handle None values as expected.
It may be even simpler to remove this checking and call self.threshold_ = float(threshold)
instead. This will raise a ValueError if given invalid input, and also handles a broader range of float-like types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this simpler solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree as well. Perhaps writing a couple of test to make sure this raises the right error is still a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! Will do
I made a test to check correct behavior. The |
metric_learn/base_metric.py
Outdated
if threshold is isinstance(threshold, (bool)): | ||
raise ValueError('Parameter threshold must be a real number. ' | ||
'Got {} instead.'.format(type(threshold))) | ||
self.threshold_ = float(threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments:
- Not sure if it makes sense to make an exception for
bool
- Maybe it is good to throw the custom error in all cases, which may be more informative for users? what do you think @perimosocordiae?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and the first approach was that, then I changed it to float(threshold)
. But given that it works for all except bool
, maybe its better to use the custom error for all cases as proposed initially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this PR fell off my radar for a while. I tend to lean toward being permissive about these things, so I don't mind threshold=True
resulting in a threshold of 1.0 here. If you do want to explicitly forbid it, though, that's fine by me. The check can be simpler:
if isinstance(threshold, bool):
raise ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with being permissive as well. My concern is whether the generic ValueError will be explicit enough for the user, or whether we should throw the custom error message in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now the code is
if threshold is isinstance(threshold, (bool)):
raise ValueError('Parameter threshold must be a real number. '
'Got {} instead.'.format(type(threshold)))
self.threshold_ = float(threshold) # Will throw ValueError/TypeError
But if we would like to be permissive and display a custom message I would suggest the original solution:
if not isinstance(threshold, (int, float)):
raise ValueError('Parameter threshold must be a real number. '
'Got {} instead.'.format(type(threshold)))
self.threshold_ = float(threshold) # int -> float
It is used this way in _validate_calibration_params
static method in PairClassifiers
.
I've updated the test as well, now its parametrized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some last nitpicks...
(If I'm not mistaken, the current solution is not permissive for bool
: it would throw a ValueError, wouldn't it?)
metric_learn/base_metric.py
Outdated
if not isinstance(threshold, (int, float)): | ||
raise ValueError('Parameter threshold must be a real number. ' | ||
'Got {} instead.'.format(type(threshold))) | ||
self.threshold_ = float(threshold) # int -> float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to nitpick but can't you just call self.threshold_ = float(threshold)
at the beginning and throw the custom warning if a ValueError is raised ? This would avoid calling isinstance
; and would just let bool
go through (converting to 0/1), which is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it for this
try:
self.threshold_ = float(threshold)
except Exception:
raise ValueError('Parameter threshold must be a real number. '
'Got {} instead.'.format(type(threshold)))
I guess if a warning is shown instead, the code can fail somewhere else, because it won't stop the execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And forgot to mention that isinstance(threshold, (int, float))
is permissive for bool. float(threshold)
is also permissive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - will let @perimosocordiae check that all is good for him and merge
metric_learn/base_metric.py
Outdated
self.threshold_ = threshold | ||
try: | ||
self.threshold_ = float(threshold) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except ValueError
(the bare Exception will catch too many unrelated errors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to catch ValueError
and TypeError
. Most cases fall into TypeError
, but String
fall into ValueError
with the funtion float()
:
ValueError("could not convert string to float: 'ABC'")
Considering both types of exceptions, the test passes.
Merged, thanks! |
Closes #219
As stated in the issue #219 , the threshold variable for
_PairsClassifierMixin
must be a real number (float or integer).A type check has been added in
set_threshold()
method, following the same format/guideline of type checking in other parts of the code.The threshold set from
calibrate_threshold()
method is guaranteed to be float, so no additional type checking is needed.