Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threshold must be a real number #322

Merged
merged 11 commits into from
Nov 2, 2021

Conversation

mvargas33
Copy link
Contributor

@mvargas33 mvargas33 commented Sep 14, 2021

Closes #219

As stated in the issue #219 , the threshold variable for _PairsClassifierMixin must be a real number (float or integer).

A type check has been added in set_threshold() method, following the same format/guideline of type checking in other parts of the code.

The threshold set from calibrate_threshold() method is guaranteed to be float, so no additional type checking is needed.

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Others PTAL

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mvargas33!
It would be nice to add a couple of tests to ensure we have the expected behavior

@terrytangyuan @perimosocordiae @wdevazelhes: I can't find the automatic CI / coverage reports - is it me or something is broken there?

@terrytangyuan
Copy link
Member

terrytangyuan commented Sep 14, 2021

From https://travis-ci.org/github/scikit-learn-contrib/metric-learn/builds, it says Since June 15th, 2021, the building on travis-ci.org is ceased. Please use travis-ci.com from now on. (Travis is no longer free)

We should probably switch to use GitHub Actions now. Many projects have already migrated.

@bellet
Copy link
Member

bellet commented Sep 14, 2021

From https://travis-ci.org/github/scikit-learn-contrib/metric-learn/builds, it says Since June 15th, 2021, the building on travis-ci.org is ceased. Please use travis-ci.com from now on. (Travis is no longer free)

We should probably switch to use GitHub Actions now. Many projects have already migrated.

Indeed it seems CI / codecov checks are no longer working (last ones are 4 months ago). Switching to GitHub Actions sounds good - do you happen to know how to go about it? I've never used it.

@terrytangyuan
Copy link
Member

It's pretty straightforward to set up. Just create a .yml file in .github/workflows with the testing steps. A template can be found here.

@terrytangyuan
Copy link
Member

Created #325 to track that separately. Feel free to pick it up.

@@ -410,6 +410,10 @@ def set_threshold(self, threshold):
"""
check_is_fitted(self, 'preprocessor_')

if threshold is None or not isinstance(threshold, (int, float)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check is None here, the isinstance check will handle None values as expected.

It may be even simpler to remove this checking and call self.threshold_ = float(threshold) instead. This will raise a ValueError if given invalid input, and also handles a broader range of float-like types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this simpler solution

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree as well. Perhaps writing a couple of test to make sure this raises the right error is still a good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! Will do

@mvargas33
Copy link
Contributor Author

I made a test to check correct behavior. The float(threshold) works for almost all datatypes, but not for boolean as float(True)=1.0, so I had to add isinstance() just to check the boolean type.

Comment on lines 412 to 415
if threshold is isinstance(threshold, (bool)):
raise ValueError('Parameter threshold must be a real number. '
'Got {} instead.'.format(type(threshold)))
self.threshold_ = float(threshold)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments:

  • Not sure if it makes sense to make an exception for bool
  • Maybe it is good to throw the custom error in all cases, which may be more informative for users? what do you think @perimosocordiae?

Copy link
Contributor Author

@mvargas33 mvargas33 Sep 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and the first approach was that, then I changed it to float(threshold). But given that it works for all except bool, maybe its better to use the custom error for all cases as proposed initially.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this PR fell off my radar for a while. I tend to lean toward being permissive about these things, so I don't mind threshold=True resulting in a threshold of 1.0 here. If you do want to explicitly forbid it, though, that's fine by me. The check can be simpler:

if isinstance(threshold, bool):
   raise ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with being permissive as well. My concern is whether the generic ValueError will be explicit enough for the user, or whether we should throw the custom error message in all cases.

metric_learn/base_metric.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@mvargas33 mvargas33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now the code is

  if threshold is isinstance(threshold, (bool)):
    raise ValueError('Parameter threshold must be a real number. '
                     'Got {} instead.'.format(type(threshold)))
  self.threshold_ = float(threshold)  # Will throw ValueError/TypeError

But if we would like to be permissive and display a custom message I would suggest the original solution:

  if not isinstance(threshold, (int, float)):
    raise ValueError('Parameter threshold must be a real number. '
                     'Got {} instead.'.format(type(threshold)))
  self.threshold_ = float(threshold)  # int -> float

It is used this way in _validate_calibration_params static method in PairClassifiers.

I've updated the test as well, now its parametrized.

metric_learn/base_metric.py Outdated Show resolved Hide resolved
Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some last nitpicks...

(If I'm not mistaken, the current solution is not permissive for bool: it would throw a ValueError, wouldn't it?)

Comment on lines 572 to 575
if not isinstance(threshold, (int, float)):
raise ValueError('Parameter threshold must be a real number. '
'Got {} instead.'.format(type(threshold)))
self.threshold_ = float(threshold) # int -> float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to nitpick but can't you just call self.threshold_ = float(threshold) at the beginning and throw the custom warning if a ValueError is raised ? This would avoid calling isinstance; and would just let bool go through (converting to 0/1), which is fine.

Copy link
Contributor Author

@mvargas33 mvargas33 Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it for this

try:
  self.threshold_ = float(threshold)
except Exception:
  raise ValueError('Parameter threshold must be a real number. '
                   'Got {} instead.'.format(type(threshold)))

I guess if a warning is shown instead, the code can fail somewhere else, because it won't stop the execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And forgot to mention that isinstance(threshold, (int, float)) is permissive for bool. float(threshold) is also permissive.

test/test_pairs_classifiers.py Outdated Show resolved Hide resolved
@mvargas33 mvargas33 requested a review from bellet October 29, 2021 08:27
Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - will let @perimosocordiae check that all is good for him and merge

self.threshold_ = threshold
try:
self.threshold_ = float(threshold)
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except ValueError

(the bare Exception will catch too many unrelated errors)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to catch ValueError and TypeError. Most cases fall into TypeError, but String fall into ValueError with the funtion float():

ValueError("could not convert string to float: 'ABC'")

Considering both types of exceptions, the test passes.

@perimosocordiae perimosocordiae merged commit 964f28d into scikit-learn-contrib:master Nov 2, 2021
@perimosocordiae
Copy link
Contributor

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

forbid any non-float value for the threshold
4 participants