Improve scanpy compatibility #774

jashapiro · 2024-07-26T20:38:47Z

Closes #773

I'm filing this as a draft because I have not yet really tested it, but I wanted to get something up while it was fresh in my mind.

The goal here is to improve the compatibility with scanpy as described in #773, so I have done 4 main things:

Use lower case for X_pca and X_umap to match scanpy.
Convert the PCA and UMAP matrices to numpy matrices.
add a highly_variable column to the var table
Add PCA metadata to uns

The last step involves exporting the variance explained data and then importing that separately, with the assumption that the PCA was centered and highly variable genes were used. I could (and probably should) make those assumptions into arguments for the script just to be safe and maybe a bit future-proof.

There are probably also a few places where I am making other assumptions that I should check more explicitly. As I said, a draft...

I also renamed the script to be a bit more generally named, but annoyingly there were apparently enough changes that GitHub is not displaying it as a rename but as a deletion and recreation. Probably because it was run through a code formatter automatically.

Sync changes from `development` into `main`

allyhawkins

Generally this all looks good, I just found a few places we might want to add or update checks.

I also think it would be good to test this using the default run IDs that we have in the CCDL profile if you haven't already.

bin/reformat_anndata.py

allyhawkins · 2024-07-29T19:24:11Z

bin/reformat_anndata.py

+adata.var["highly_variable"] = adata.var.gene_ids.isin(
+    adata.uns["highly_variable_genes"]
+)


I think you might want to only do this if args.mask_var != "None"? And does this fail if adata.uns["highly_variable_genes"] does not exist? If so, we should add a check for that.

Yes, you are correct, as I realized when looking at the first comment!

bin/reformat_anndata.py

allyhawkins · 2024-07-29T19:30:20Z

merge.nf

-      move_counts_anndata.py --anndata_file ${rna_h5ad_file}
-      ${has_adt ? "move_counts_anndata.py --anndata_file ${feature_h5ad_file}" : ''}
+      reformat_anndata.py --anndata_file ${rna_h5ad_file}
+      ${has_adt ? "reformat_anndata.py --anndata_file ${feature_h5ad_file}" : ''}


Do you want to include the PCA information in these objects too like you do for the processed individual objects? I believe we do re-calculate PCs after merging.

I wasn't planning to change anything for the merged files. We don't move the reduced dims to X_PCA and X_UMAP for those, do we?

We do move them in those objects.

scpca-nf/bin/sce_to_anndata.R

Line 70 in f215046

reducedDimNames(sce) <- glue::glue("X_{reducedDimNames(sce)}")

I can see both arguments, but I think for consistency between objects we would want to do this for merged objects?

I agree that we should make the names match. I think I don't want to add the .uns["pca"] object for merged data though, since we use a different function for the PCA calculation for merged data that I am not sure would be equivalent to what ScanPy would do. So I think probably better to just leave that off?

I think that makes sense. So for the merged object the only change should be the lower case naming.

Co-authored-by: Ally Hawkins <54039191+allyhawkins@users.noreply.github.com>

jashapiro · 2024-07-30T16:34:47Z

Test run here completed successfully, and at first examination the outputs are as expected.

allyhawkins

LGTM

allyhawkins and others added 6 commits July 16, 2024 09:44

Merge pull request #772 from AlexsLemonade/development

f215046

Sync changes from `development` into `main`

convert obsm to ndarrays

0c9bdf3

use lower case for obsm

f31126f

add PCA variance output

8b83978

add variable gene conversion and PCA metadata

8304519

change script name

e7d4758

jashapiro changed the title ~~Improve scanpy compatilibility~~ Improve scanpy compatibility Jul 29, 2024

jashapiro added 2 commits July 29, 2024 09:39

make tsv case-insensitive

588475b

add args for pca values

09904b5

jashapiro requested a review from allyhawkins July 29, 2024 18:21

allyhawkins reviewed Jul 29, 2024

View reviewed changes

allyhawkins mentioned this pull request Jul 29, 2024

Update docs to account for changes in AnnData objects for scanpy compatibility AlexsLemonade/scpca-docs#337

Closed

jashapiro and others added 3 commits July 29, 2024 16:51

Apply suggestions from code review

8dc9701

Co-authored-by: Ally Hawkins <54039191+allyhawkins@users.noreply.github.com>

Only do hvg copying if hvg is specified

734ee95

lower case the merged too

6a4a59f

jashapiro requested a review from allyhawkins July 30, 2024 16:34

allyhawkins approved these changes Jul 30, 2024

View reviewed changes

jashapiro merged commit 634e3e5 into development Jul 30, 2024
4 checks passed

allyhawkins mentioned this pull request Jul 31, 2024

Updates to AnnData contents based on new scanpy compatibility AlexsLemonade/scpca-docs#338

Merged

jashapiro mentioned this pull request Aug 8, 2024

Port AnnData format changes from scpca-nf AlexsLemonade/OpenScPCA-nf#87

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve scanpy compatibility #774

Improve scanpy compatibility #774

jashapiro commented Jul 26, 2024

allyhawkins left a comment

allyhawkins Jul 29, 2024 •

edited

Loading

jashapiro Jul 29, 2024

allyhawkins Jul 29, 2024

jashapiro Jul 29, 2024

allyhawkins Jul 29, 2024

jashapiro Jul 29, 2024

allyhawkins Jul 30, 2024

jashapiro commented Jul 30, 2024

allyhawkins left a comment

Improve scanpy compatibility #774

Improve scanpy compatibility #774

Conversation

jashapiro commented Jul 26, 2024

allyhawkins left a comment

Choose a reason for hiding this comment

allyhawkins Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

jashapiro Jul 29, 2024

Choose a reason for hiding this comment

allyhawkins Jul 29, 2024

Choose a reason for hiding this comment

jashapiro Jul 29, 2024

Choose a reason for hiding this comment

allyhawkins Jul 29, 2024

Choose a reason for hiding this comment

jashapiro Jul 29, 2024

Choose a reason for hiding this comment

allyhawkins Jul 30, 2024

Choose a reason for hiding this comment

jashapiro commented Jul 30, 2024

allyhawkins left a comment

Choose a reason for hiding this comment

allyhawkins Jul 29, 2024 •

edited

Loading