[Tests][Warnings] Cut all warnings from SCML using a minimal solution #341

mvargas33 · 2021-10-28T11:39:56Z

As suggested by @bellet in #339, it's better to cut SCML warnings from the root by specifying the n_basis value when its possible.

As all test run the iris dataset for SCML though build_triplets at test_utils.py, then we just need to scpecify n_basis for SCML and SCML_Supervised in the lists.

Going through the code of SCML, if n_basis is None, then the following default is assigned:

Default for SCML_Supervised: (lines 579-583)

n_basis = min(20*n_features, n_samples*2*min(n_classes-1, n_features) - 1)

Default for SCML: (line 234)

n_basis = n_features*80

Iris dataset has 150 samples, 3 classes, 4 features. So the value is 80 for SCML_Supervised and 320 for SCML.

I changed it, but got the following tests failing.

In test_components_is_2D at test_mahalanobis_mixin.py there is an error because of a border case. I got this warning from SCML:

The number of bases with nonzero weight is less than the number of features of the input, in consequence the learned transformation reduces the dimension to 0

And this error from the test:

>     assert model.components_.shape == (1, 1)  # the components must be 2D
E     assert (0, 1) == (1, 1)
E       At index 0 diff: 0 != 1
E       Use -v to get the full diff

So the test want to test fit for just one feature. Right now the feature selected from Iris is the 1st one, but if we use 2nd dimension instead, we can avoid this border case and make the test pass. The behavior expected is the same.

# trunc_data = input_data[..., :1]  # Old
trunc_data = input_data[..., 1:2]  # New

There is also an error in test_sklearn_compat.py, TestSklearnCompat class, test_scml, check_estimator function if n_basis is arbitrary, because under the hood it calls fit() wich calls _generate_bases_LDA, and if n_basis is wrong, it may fail. Thus, we need to let SCML calc n_basis for us as we don't know wich random X, y dataset is being used. Thus, we need to use pytest.warn to catch the warning.
Not an error, but an obervation in test_triplet_diffs and test_lda: These tests needs an explicitly n_basis=None to check if it is being set correctly internally, thus, in this case the warning needs to be caught as well.

Besides this observations, all warnings from SCML were removed. Only 300-ish warnings are shown now across all tests.

Also, this PR removes the isinstance() overkill made in test_triplets_clasifiers.py previously.

Best! 🎃

…d as well.

bellet · 2021-11-04T12:45:30Z

Thanks a lot, @mvargas33!

Assigned n_basis to SCML when needed for tests. Catch warn when neede…

23a7b4d

…d as well.

perimosocordiae approved these changes Oct 31, 2021

View reviewed changes

terrytangyuan approved these changes Oct 31, 2021

View reviewed changes

terrytangyuan merged commit bdfdb24 into scikit-learn-contrib:master Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tests][Warnings] Cut all warnings from SCML using a minimal solution #341

[Tests][Warnings] Cut all warnings from SCML using a minimal solution #341

mvargas33 commented Oct 28, 2021

bellet commented Nov 4, 2021

[Tests][Warnings] Cut all warnings from SCML using a minimal solution #341

[Tests][Warnings] Cut all warnings from SCML using a minimal solution #341

Conversation

mvargas33 commented Oct 28, 2021

bellet commented Nov 4, 2021