-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Extend _l2norm for sparse input #114
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me. I'll merge as soon as you make the CI pass (I think black is complaining about the trailing whitespace in line 159)
Thanks! I did some more testing today and spotted a different behavior in my testing environment, where the |
muon/_core/preproc.py
Outdated
i = i[isfin] | ||
j = j[isfin] | ||
val = val[isfin] | ||
norm = csr_matrix((val, (i, j)), shape=X.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for csr_matrix
, csc_matrix
, and coo_matrix
it would probably be more efficient to do something like norm.data[~np.isfinite(norm.data)] = 0
. Sparse matrices in AnnData are usually CSR or CSC (in fact, IO only has support for CSR and CSC implemented), so it would probably make sense to also add that special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a very good point! I simplified it in commit e7b7b12
muon/_core/preproc.py
Outdated
X.astype(norm.dtype, copy=False) | ||
if sparse_X and not issparse(norm): | ||
norm[~np.isfinite(norm)] = 0 | ||
X = X.toarray() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that will do what you're expecting it to do. The line below (X[:] = norm
) overwrites the matrix in the AnnData object with the normalized values. Now you're replacing the local X variable with something else, so the normalized values will be lost. The question is: Is there ever a situation where X
is sparse and norm
is not? At least if X
is a csr_matrix
or a csc_matrix
norm
should always be a COO matrix.
Then you could also get rid of the if clause in line 162.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I removed the lines in commit e7b7b12
muon/_core/preproc.py
Outdated
X.astype(norm.dtype, copy=False) | ||
if sparse_X and not issparse(norm): | ||
norm[~np.isfinite(norm)] = 0 | ||
X = X.toarray() | ||
X[:] = norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure at the moment, but can you test if this does what we expect it to do for sparse matrices? I.e. it doesn't change the sparsity structure? Otherwise it would perhaps make sense to do something like
if sparse_X and (isspmatrix_csc(X) or isspmatrix_csr(X) or issspmatrix_coo(X)):
X.data[:] = norm.data[:]
else:
X[:] = norm
(I don't know enough about the other sparse matrix types off the top of my head, but this should cover most usecases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a weird case under Python 3.8 and scipy==1.10.0
when norm
became dense while X
was still sparse, and then X[:] = norm
crashed my kernel. That's why I wanted to catch that error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll test your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested your suggestion and included it in the code with minimal adaptations (commit cd3f942).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on the part where norm
became dense and X
was still sparse. I added a clause to convert norm
into a csr_matrix
in that case.
Please merge if you think it fits to |
Thank you for the contribution. |
Is your feature request related to a problem? Please describe.
I tried to compute a WNN integration on a MuData object, i.e. I executed the following steps:
However, my Python session keeps crashing at the
l2norm
step. Omitting this step, computing neighbors and a UMAP works fine. I looked into themu.pp.l2norm
function and isolated the functionnp.linalg.norm()
in the_l2norm()
function as the culprit. It crashes reliably for sparse matrices. Converting the data matrix into dense format fixes the issue. This PR extends the_l2norm
function to sparse matrices.