Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] fix test_estimators[LogisticRegression()-check_estimators_unfitted] conformance for gpu support #2109

Merged
merged 2 commits into from
Oct 15, 2024

Conversation

icfaust
Copy link
Contributor

@icfaust icfaust commented Oct 15, 2024

Description

Fixes issues in GPU conformance on private CI. None of the methods of LogisticRegression in scikit-learn return or store sparse arrays, which means the check in _onedal_gpu_predict_supported is an unnecessary one. When the check requiring fitted variables coef_ or intercept_ are removed, the underlying check_is_fitted calls do what is necessary to pass this sklearn conformance test. Here is an example of fitting sklearns' LogisticRegression with sparse data which yields a numpy array:

from sklearn.linear_model import LogisticRegression
import numpy as np
import scipy.sparse as sp
X = sp.csr_matrix(np.eye(10))
y = np.arange(10) % 2
est = LogisticRegression()
est.fit(X,y)
print(type(est.coef_))

Will yield:
<class 'numpy.ndarray'>


Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@icfaust icfaust changed the title ]fox [fix] add check_is_fitted before results check in LogisticRegression._onedal_predict_supported Oct 15, 2024
@icfaust icfaust changed the title [fix] add check_is_fitted before results check in LogisticRegression._onedal_predict_supported [fix] add check_is_fitted before results check in LogisticRegression._onedal_gpu_predict_supported Oct 15, 2024
@icfaust icfaust changed the title [fix] add check_is_fitted before results check in LogisticRegression._onedal_gpu_predict_supported [fix] fix test_estimators[LogisticRegression()-check_estimators_unfitted] conformance for gpu support Oct 15, 2024
@icfaust
Copy link
Contributor Author

icfaust commented Oct 15, 2024

/intelci: run

@icfaust icfaust marked this pull request as ready for review October 15, 2024 07:09
@icfaust icfaust added the bug Something isn't working label Oct 15, 2024
@icfaust icfaust requested a review from avolkov-intel October 15, 2024 07:36
Copy link
Contributor

@ahuber21 ahuber21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a preferred alternative to use issparse(getattr(self, "coeff_")) and issparse(getattr(self, "intercept_")) instead.
I realize that right now there's no difference because there are no normal scenarios in which those attributes will be sparse. But we may want to come back to it later.

A second thought is that the order of operations

  • check if GPU is supported
  • check if model is fitted

could be swapped all together. That should also avoid the AttributeError.

I'm approving, asking you to merge if all CIs are green. Let's keep the comment above in mind for future refactorings.

@icfaust
Copy link
Contributor Author

icfaust commented Oct 15, 2024

It may be a preferred alternative to use issparse(getattr(self, "coeff_")) and issparse(getattr(self, "intercept_")) instead. I realize that right now there's no difference because there are no normal scenarios in which those attributes will be sparse. But we may want to come back to it later.

A second thought is that the order of operations

  • check if GPU is supported
  • check if model is fitted

could be swapped all together. That should also avoid the AttributeError.

I'm approving, asking you to merge if all CIs are green. Let's keep the comment above in mind for future refactorings.

Definitely agree. Checking model is fitted before other steps should be a rule.

@icfaust icfaust merged commit 64d71cf into uxlfoundation:main Oct 15, 2024
35 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants