Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Extend _l2norm for sparse input #114

Merged
merged 10 commits into from
Jul 3, 2023
Merged

Conversation

mbuttner
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I tried to compute a WNN integration on a MuData object, i.e. I executed the following steps:

mu.pp.l2norm(mdata)
mu.pp.neighbors(mdata)
mu.tl.umap(mdata)

However, my Python session keeps crashing at the l2norm step. Omitting this step, computing neighbors and a UMAP works fine. I looked into the mu.pp.l2norm function and isolated the function np.linalg.norm() in the _l2norm() function as the culprit. It crashes reliably for sparse matrices. Converting the data matrix into dense format fixes the issue. This PR extends the _l2norm function to sparse matrices.

@Zethson Zethson requested a review from ilia-kats June 28, 2023 06:20
Copy link
Collaborator

@ilia-kats ilia-kats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me. I'll merge as soon as you make the CI pass (I think black is complaining about the trailing whitespace in line 159)

@mbuttner
Copy link
Contributor Author

Thanks! I did some more testing today and spotted a different behavior in my testing environment, where the norm matrix is still sparse after dividing with the l2norm vector, and add handling of infinite values for sparse matrices.

i = i[isfin]
j = j[isfin]
val = val[isfin]
norm = csr_matrix((val, (i, j)), shape=X.shape)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for csr_matrix, csc_matrix, and coo_matrix it would probably be more efficient to do something like norm.data[~np.isfinite(norm.data)] = 0. Sparse matrices in AnnData are usually CSR or CSC (in fact, IO only has support for CSR and CSC implemented), so it would probably make sense to also add that special case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a very good point! I simplified it in commit e7b7b12

X.astype(norm.dtype, copy=False)
if sparse_X and not issparse(norm):
norm[~np.isfinite(norm)] = 0
X = X.toarray()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that will do what you're expecting it to do. The line below (X[:] = norm) overwrites the matrix in the AnnData object with the normalized values. Now you're replacing the local X variable with something else, so the normalized values will be lost. The question is: Is there ever a situation where X is sparse and norm is not? At least if X is a csr_matrix or a csc_matrix norm should always be a COO matrix.

Then you could also get rid of the if clause in line 162.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I removed the lines in commit e7b7b12

X.astype(norm.dtype, copy=False)
if sparse_X and not issparse(norm):
norm[~np.isfinite(norm)] = 0
X = X.toarray()
X[:] = norm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure at the moment, but can you test if this does what we expect it to do for sparse matrices? I.e. it doesn't change the sparsity structure? Otherwise it would perhaps make sense to do something like

if sparse_X and (isspmatrix_csc(X) or isspmatrix_csr(X) or issspmatrix_coo(X)):
    X.data[:] = norm.data[:]
else:
    X[:] = norm

(I don't know enough about the other sparse matrix types off the top of my head, but this should cover most usecases)

Copy link
Contributor Author

@mbuttner mbuttner Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a weird case under Python 3.8 and scipy==1.10.0 when norm became dense while X was still sparse, and then X[:] = norm crashed my kernel. That's why I wanted to catch that error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test your suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested your suggestion and included it in the code with minimal adaptations (commit cd3f942).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on the part where norm became dense and X was still sparse. I added a clause to convert norm into a csr_matrix in that case.

@mbuttner
Copy link
Contributor Author

Please merge if you think it fits to muon. I tested my version with sparse matrices and did not spot any issues.

@ilia-kats ilia-kats merged commit d31ab16 into scverse:master Jul 3, 2023
@ilia-kats
Copy link
Collaborator

Thank you for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants