Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests][Warnings] Cut all warnings from SCML using a minimal solution #341

Merged
merged 1 commit into from
Nov 2, 2021

Conversation

mvargas33
Copy link
Contributor

As suggested by @bellet in #339, it's better to cut SCML warnings from the root by specifying the n_basis value when its possible.

As all test run the iris dataset for SCML though build_triplets at test_utils.py, then we just need to scpecify n_basis for SCML and SCML_Supervised in the lists.

Going through the code of SCML, if n_basis is None, then the following default is assigned:

Default for SCML_Supervised: (lines 579-583)

n_basis = min(20*n_features, n_samples*2*min(n_classes-1, n_features) - 1)

Default for SCML: (line 234)

n_basis = n_features*80

Iris dataset has 150 samples, 3 classes, 4 features. So the value is 80 for SCML_Supervised and 320 for SCML.

I changed it, but got the following tests failing.

  • In test_components_is_2D at test_mahalanobis_mixin.py there is an error because of a border case. I got this warning from SCML:
The number of bases with nonzero weight is less than the number of features of the input, in consequence the learned transformation reduces the dimension to 0

And this error from the test:

>     assert model.components_.shape == (1, 1)  # the components must be 2D
E     assert (0, 1) == (1, 1)
E       At index 0 diff: 0 != 1
E       Use -v to get the full diff

So the test want to test fit for just one feature. Right now the feature selected from Iris is the 1st one, but if we use 2nd dimension instead, we can avoid this border case and make the test pass. The behavior expected is the same.

# trunc_data = input_data[..., :1]  # Old
trunc_data = input_data[..., 1:2]  # New
  • There is also an error in test_sklearn_compat.py, TestSklearnCompat class, test_scml, check_estimator function if n_basis is arbitrary, because under the hood it calls fit() wich calls _generate_bases_LDA, and if n_basis is wrong, it may fail. Thus, we need to let SCML calc n_basis for us as we don't know wich random X, y dataset is being used. Thus, we need to use pytest.warn to catch the warning.

  • Not an error, but an obervation in test_triplet_diffs and test_lda: These tests needs an explicitly n_basis=None to check if it is being set correctly internally, thus, in this case the warning needs to be caught as well.

Besides this observations, all warnings from SCML were removed. Only 300-ish warnings are shown now across all tests.

Also, this PR removes the isinstance() overkill made in test_triplets_clasifiers.py previously.

Best! 🎃

@terrytangyuan terrytangyuan merged commit bdfdb24 into scikit-learn-contrib:master Nov 2, 2021
@bellet
Copy link
Member

bellet commented Nov 4, 2021

Thanks a lot, @mvargas33!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants