Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error dec_celltype using cell2loc #11

Closed
ccruizm opened this issue Sep 26, 2022 · 17 comments
Closed

Error dec_celltype using cell2loc #11

ccruizm opened this issue Sep 26, 2022 · 17 comments
Labels
bug Something isn't working

Comments

@ccruizm
Copy link

ccruizm commented Sep 26, 2022

Good day!

I have gicen a try to use this great tool with my data but have encountered a problem when using cell2loc. The pipeline ran and generated the cell2loca_results matrix but one it usues the function generate_newmeta_cell generates the error below:

New names:
• `` -> `...1`
Rows: 3967 Columns: 30
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): ...1
dbl (29): q05cell_abundance_w_sf_AC_like, q05cell_abundance_w_sf_Astrocyte, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Generating single-cell data for each spot 
Error in {: task 1 failed - "incompatible dimensions"
Traceback:

1. dec_celltype(object = obj, sc_data = as.matrix(GetAssayData(sc, 
 .     slot = "counts")), sc_celltype = as.character(sc@meta.data$celltype), 
 .     method = 7, env = "cell2loc_env")
2. .generate_newmeta_cell(newmeta, st_ndata, sc_ndata, sc_celltype, 
 .     iter_num, if_doParallel)
3. foreach::foreach(i = 1:length(newmeta_spotname), .combine = "rbind", 
 .     .packages = "Matrix", .export = ".generate_newmeta_spot") %dopar% 
 .     {
 .         spot_name <- newmeta_spotname[i]
 .         .generate_newmeta_spot(spot_name, newmeta, st_ndata, 
 .             sc_ndata, sc_celltype, iter_num)
 .     }
4. e$fun(obj, substitute(ex), parent.frame(), e$data)

Do you know where the problem might be?

My sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /hpc/pmc_stunnenberg/cruiz/miniconda3/envs/r_pHGG_project/lib/libopenblasp-r0.3.12.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] sceasy_0.0.7       reticulate_1.26    future_1.28.0      patchwork_1.1.2   
 [5] RColorBrewer_1.1-3 igraph_1.3.1       SpaTalk_1.0        doParallel_1.0.17 
 [9] iterators_1.0.14   foreach_1.5.2      ggalluvial_0.12.3  ggplot2_3.3.6     
[13] dplyr_1.0.10       sp_1.4-7           SeuratObject_4.1.0 Seurat_4.1.0      
[17] data.table_1.14.2  Matrix_1.3-3      

loaded via a namespace (and not attached):
  [1] backports_1.4.1       uuid_1.1-0            plyr_1.8.7           
  [4] repr_1.1.3            lazyeval_0.2.2        splines_4.0.3        
  [7] RcppHNSW_0.4.1        listenv_0.8.0         scattermore_0.8      
 [10] digest_0.6.29         htmltools_0.5.3       fansi_1.0.3          
 [13] magrittr_2.0.3        tensor_1.5            cluster_2.1.2        
 [16] ROCR_1.0-11           tzdb_0.3.0            globals_0.16.1       
 [19] readr_2.1.2           matrixStats_0.62.0    vroom_1.5.7          
 [22] spatstat.sparse_2.1-1 prettyunits_1.1.1     colorspace_2.0-3     
 [25] rappdirs_0.3.3        ggrepel_0.9.1         crayon_1.5.1         
 [28] jsonlite_1.8.0        scatterpie_0.1.8      progressr_0.11.0     
 [31] spatstat.data_2.2-0   survival_3.2-11       zoo_1.8-10           
 [34] glue_1.6.2            polyclip_1.10-0       gtable_0.3.1         
 [37] leiden_0.4.3          car_3.1-0             future.apply_1.9.1   
 [40] abind_1.4-5           scales_1.2.1          pheatmap_1.0.12      
 [43] DBI_1.1.2             rstatix_0.7.0         spatstat.random_2.2-0
 [46] miniUI_0.1.1.1        Rcpp_1.0.9            viridisLite_0.4.1    
 [49] xtable_1.8-4          progress_1.2.2        spatstat.core_2.4-2  
 [52] bit_4.0.4             NNLM_0.4.4            htmlwidgets_1.5.4    
 [55] httr_1.4.4            ellipsis_0.3.2        ica_1.0-3            
 [58] pkgconfig_2.0.3       farver_2.1.1          uwot_0.1.11          
 [61] deldir_1.0-6          here_1.0.1            utf8_1.2.2           
 [64] labeling_0.4.2        tidyselect_1.1.2      rlang_1.0.4          
 [67] reshape2_1.4.4        later_1.3.0           munsell_0.5.0        
 [70] tools_4.0.3           cli_3.3.0             generics_0.1.3       
 [73] broom_1.0.1           ggridges_0.5.3        evaluate_0.16        
 [76] stringr_1.4.1         fastmap_1.1.0         goftest_1.2-3        
 [79] bit64_4.0.5           fitdistrplus_1.1-8    purrr_0.3.4.9000     
 [82] RANN_2.6.1            pbapply_1.5-0         nlme_3.1-152         
 [85] mime_0.12             ggExtra_0.10.0        hdf5r_1.3.5          
 [88] compiler_4.0.3        plotly_4.10.0.9001    png_0.1-7            
 [91] ggsignif_0.6.3        spatstat.utils_2.3-1  tibble_3.1.8         
 [94] tweenr_2.0.2          stringi_1.7.8         RSpectra_0.16-1      
 [97] rgeos_0.5-9           lattice_0.20-44       IRdisplay_1.0        
[100] vctrs_0.4.1           pillar_1.8.1          lifecycle_1.0.1      
[103] spatstat.geom_2.4-0   lmtest_0.9-40         RcppAnnoy_0.0.19     
[106] cowplot_1.1.1         irlba_2.3.5           httpuv_1.6.6         
[109] R6_2.5.1              promises_1.2.0.9000   KernSmooth_2.23-20   
[112] gridExtra_2.3         parallelly_1.32.1     codetools_0.2-18     
[115] fastDummies_1.6.3     MASS_7.3-54           assertthat_0.2.1     
[118] rprojroot_2.0.3       withr_2.5.0           sctransform_0.3.3    
[121] mgcv_1.8-35           hms_1.1.2             grid_4.0.3           
[124] rpart_4.1-15          ggfun_0.0.7           IRkernel_1.1.1       
[127] tidyr_1.2.1           carData_3.0-5         Cairo_1.5-15         
[130] Rtsne_0.16            ggpubr_0.4.0          pbdZMQ_0.3-5         
[133] ggforce_0.3.4         shiny_1.7.2           base64enc_0.1-3  

Thanks in advance!

@ccruizm
Copy link
Author

ccruizm commented Sep 27, 2022

I ran it using the example dataset, and it worked. Maybe I am not creating the SpaTalk object correctly. Could you please tell me how to import Visium data into the pipeline?

@multitalk
Copy link
Collaborator

Thanks your feedback. For Visium data, you just need prepare the st_data and st_meta as showed in spot-based (vignette) ST data

@ccruizm
Copy link
Author

ccruizm commented Sep 28, 2022

Yes, I followed it, but it still gives me an error. I created it according to that vignette, and it creates the SpaTalk data using createSpaTalk, and it runs all cell2loc with no problem. The issue comes when returned st_coef matrix and tries to generate synthetic single-cell data from each spot (generate_newmeta_cell).

Could you please share the code you used to read Visium data and analyze it with SpaTalk?

@multitalk
Copy link
Collaborator

@ccruizm Here is my code used to read Visium data and analyze it with SpaTalk.

library(Seurat)
library(SpaTalk)
# 10X mouse kidney spatial data
rawdata <- Load10X_Spatial(data.dir = 'kidney/')
st_data <- rawdata@assays$Spatial@data
st_data <- rev_gene(data = st_data,data_type = "count",species = "Human",geneinfo = geneinfo)
st_meta <- rawdata@images[["slice1"]]@coordinates
st_meta <- st_meta[,c("tissue","imagerow","imagecol")]
colnames(st_meta) <- c("spot", "x", "y")
st_meta$spot <- rownames(st_meta)
rownames(st_meta) <- 1:nrow(st_meta)
obj <- createSpaTalk(st_data = st_data, st_meta = st_meta,species = "Human",if_st_is_sc = F,spot_max_cell = 30)
# sc_data: scRNA-seq data
# sc_celltype: cell type for each cell
obj <- dec_celltype(object = obj,sc_data = sc_data,sc_celltype = sc_celltype)
obj <- find_lr_path(object = obj,lrpairs = lrpairs,pathways = pathways)
obj <- dec_cci_all(object = obj)

@ccruizm
Copy link
Author

ccruizm commented Oct 2, 2022

I tried using your code, and I got the same error. Is there a way I can share the files with you so we can troubleshoot where the issue is, please?

@multitalk
Copy link
Collaborator

@ccruizm You can share the files by emailing to me (xin_shao@zju.edu.cn)

@ccruizm
Copy link
Author

ccruizm commented Oct 24, 2022

Hello @shaoxin0801 ,
Have you had the chance to check the files I sent you and see whether you can also reproduce the problem?
Thanks in advance!

@multitalk
Copy link
Collaborator

@ccruizm Sorry that I am so busy recently and forget to check the files. Now, I have downloaded the scRNA-seq reference data and it is okay. But I can't download the ST data which is invalid. Could you please share the relavant link. Thank you.

@ccruizm
Copy link
Author

ccruizm commented Oct 24, 2022

No worries! I understand ;) I have sent you an email with the link to download the ST data. Please let me know whether you can download it and contains all the files needed for testing. Thanks for your help!

@multitalk
Copy link
Collaborator

Good. I have downloaded the ST.zip and it is okay. I am going to perform the SpaTalk pipeline.

@ccruizm
Copy link
Author

ccruizm commented Oct 24, 2022 via email

@multitalk
Copy link
Collaborator

@ccruizm I have checked the code and figured out the error in generate_newmeta_cell. It is because of the unmatched genes between sc_data and st_data when using other methods including cell2location, which will generate different length of vectors when calculating the Pearson's correlation. I have fixed the bugs and it works now. Here is the code I used cell2location to generate st_coef first and performed dec_celltype with your provided data. You can try it now by yourself. Look forward to your reply. Thanks a lot.

> obj <- dec_celltype(object = obj,sc_data = sc_data,sc_celltype = sc_meta$celltype,method = 7,env = 'cell2location_env',dec_result = as.matrix(st_coef),if_doParallel = F)
Generating single-cell data for each spot 
***Done*** 

@ccruizm
Copy link
Author

ccruizm commented Oct 25, 2022

That's wonderful! I am running the pipeline now. So far it has run for 12h but still has not finished (already done with the cell2loc deconvolution) and using several threads.
Screenshot 2022-10-25 at 17 45 12
I noticed you added in the script dec_result = as.matrix(st_coef), if_doParallel = F. The default is TRUE but not sure if I should change it and that will improve the speed of the computations (paradoxically). I will wait and see whether the pipeline still takes more time.

Thanks for helping troubleshoot this issue :)

@multitalk
Copy link
Collaborator

You can try if_doParallel = F. Also, I have fixed some bugs in performing parallel functions and allow to retain genes consistent with sc_data when genes between st_data and sc_data are different. Thanks for your timely feedback.

@ccruizm
Copy link
Author

ccruizm commented Oct 27, 2022

That's good to know! I canceled that first run in multithreading and started a new one with if_doParallel = F. However, it has been running for 32h hours and still does not finish. is that normal? how long did it take you with the data I shared?

Also i could not set dec_result = as.matrix(st_coef). When doing this, it did not find the st_coef variable in the session and decided not to include it in the arguments. I am using:

obj <- dec_celltype(object = obj,
                    sc_data = as.matrix(GetAssayData(sc, slot = 'counts')),
                    sc_celltype = as.character(sc@meta.data$celltype),
                    method = 7, 
                    env = "cell2loc_env_2",
                    # dec_result = TRUE,
                    if_doParallel = F
                   )

Is there a way I can generate a log file to share so we can check why it is taking this long?

@multitalk
Copy link
Collaborator

To test your data, I randomly sample 50 cells for each cell type as the reference and test 50 spots. I didn't run SpaTalk for all spots with all cells in the reference. Actually, It might take a long time when you set if_doParallel = F. In addition, the more spots and more genes in sc_data and st_data, the more time it will take (some days for large visium data). You can wait and see or use if_doParallel = T

@ccruizm
Copy link
Author

ccruizm commented Oct 27, 2022

Perfect! then it is normal and will need to be patient. Thanks for the info!

@multitalk multitalk added the bug Something isn't working label Nov 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants