Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crisprViz: Error in TxDb2GRangesList (.getBiomartData(txdb, organism) : Organism "NA" not recognized in biomaRt) #36

Open
stefanusbernard opened this issue Mar 6, 2024 · 4 comments

Comments

@stefanusbernard
Copy link

Really appreciate the crisprVerse team for this robust and versatile tool to visualize and annotate sgRNAs. I tried the crisprViz tool by using the example datasets provided (gpr21GuideSet and gpr21GeneModel), and it works perfectly fine, the same as in the tutorial

Now I am interested in visualizing my sgRNAs targeting a particular gene of interest. First to build a gene model, the subset of txdb_human (GRangesList) was retrieved from crisprDesignData (as mentioned in the documentation on how to build the gpr21GeneModel). This is the step-by-step of what I have tried:

  1. Import the txdb_human (GRangesList) from crisprDesignData
  2. Unlist the GRangesList object
  3. Taking a subset of txdb_human by only selecting a gene and its canonical transcript (using subset function)
  4. Create a 'type' column in the metadata to suit the required format input in makeTxDbFromGRanges
  5. TxDb object successfully created by using makeTxDbFromGRanges function
  6. Convert the TxDb object from step number 5 into GRangesList (the required format for plotGuideSet in crisprViz) by using TxDb2GRangesList

My plan is to directly run the plotGuideSet function after the GRangesList object is successfully created (already have the sgRNA GuideSet object). However, in the step 6, an error occurred :

> granges_list_gene_model <- TxDb2GRangesList(granges_gene_model_txdb, 
+                                             standardChromOnly = TRUE,
+                                             genome = 'hg38',
+                                             seqlevelsStyle = 'UCSC')
Error in .getBiomartData(txdb, organism) : 
Organism "NA" not recognized in biomaRt. You can use",
"organism=NULL as a solution.

I checked the genomeInfo inside the GRanges object of my gene model and compared it with the gpr21GeneModel. Both indicate the same Organism: Homo sapiens. Furthermore, I noticed the TxDb2GRangesList doesn't have a parameter to state what kind of organism the user can specify.

Looking at the source code, it turns out this function is linked with another function getTxDb which allows the user to specify the organism (default: Homo sapiens). Since specifying the organism is not a parameter in the TxDb2GRangesList function, this means the user doesn't have direct control over it. Despite it was stated that the user can use 'organism = NULL' as a solution.

Could any of the team assist in this error? in particular, the steps that I have taken so far or any other way around to resolve this issue, looking forward to hearing more soon.

@Jfortin1
Copy link
Member

Hi @stefanusbernard, thank you for reporting this error.
@ltHobbes, could you look into this?

@ltHobbes
Copy link
Member

Hi @stefanusbernard,

For the functions in crisprViz, you don't need to subset the gene model at all -- the function will only plot elements that are within the plotting window while ignoring everything outside that window. This allows you to use the same gene model (which often includes all genes for a given organism) when generating multiple plots across different genomic regions.

If you must subset the GRangesList object, since it is a list it would be easier to apply your subsetting criteria to each element in the list rather than converting it into different data structures, where some information may be lost along the way. Subsetting txdb_human by gene symbols as an example:

gene_model_subset <- lapply(txdb_human, function(x) subset(x, x$gene_symbol %in% <GENES_GO_HERE>))
gene_model_subset <- as(gene_model_subset, "GRangesList")

That said, TxDb2GRangesList should be able to handle cases such as yours. I will move this issue to the crisprDesign package.

Thank you for bringing this issue to our attention.

@ltHobbes ltHobbes transferred this issue from crisprVerse/crisprViz Jul 18, 2024
@ltHobbes
Copy link
Member

NA cases are explicitly handled, but not all genomes are. Reprex:

gffFile <- system.file("extdata", "GFF3_files", "a.gff3", package="txdbmaker")
txdb <- makeTxDbFromGFF(gffFile,
                        dataSource="partial gtf file for Tomatoes for testing",
                        organism="Solanum lycopersicum")
out <- TxDb2GRangesList(txdb, 
                        standardChromOnly = TRUE)

Error in .getBiomartData(txdb, organism) : 
  Organism 'Solanum lycopersicum' not recognized in biomaRt.

@Omdeno
Copy link

Omdeno commented Nov 3, 2024

Hi @ltHobbes,
It seems that in function .getBiomartData of TxDb2GRangesList.R default biomart is used with the command biomaRt::useMart("ensembl") that contains vertebrate genomes only. For Protists, Plants, Metazoa and Fungi biomart has separate interfaces, and that is why 'Solanum lycopersicum' is not recognized. Is it possible to modify code to include other biomart datasets as an option for TxDb2GRangesList function?

For example if option 'plant' is used, than code will be something along the lines:
mart <- biomaRt::useEnsemblGenomes(biomart = "plants_mart")

And maybe replace useMart() with useEnsembl() as they recommend here https://bioconductor.org/packages/release/bioc/vignettes/biomaRt/inst/doc/accessing_ensembl.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants