Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSeQC: Add support for geneBody_coverage2.py #844

Closed
santiagorevale opened this issue Oct 16, 2018 · 9 comments
Closed

RSeQC: Add support for geneBody_coverage2.py #844

santiagorevale opened this issue Oct 16, 2018 · 9 comments
Milestone

Comments

@santiagorevale
Copy link

Description of bug:
After running rnaseq v1.1 pipeline from nf-core, I've noticed that several plots from RSeQC module were not being produced. Here is a list of them:

  • bam_stat
  • gene_body_coverage
  • inner_distance
  • junction_annotation
  • read_gc

I dig a bit on the gene_body_coverage sub-module, and I've noticed that it's looking for the word "Percentile" at the beginning of the file, while the file actually contains the word "percentile" in lower case. After manually fixing the output file, I ran the report again, but this time I've got an error.

MultiQC Error log:

[INFO   ]         multiqc : This is MultiQC v1.6
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching '.'
[INFO   ]         multiqc : Only using modules rseqc
[INFO   ]           rseqc : Found 13 read_distribution reports
[ERROR  ]         multiqc : Oops! The 'rseqc' MultiQC module broke... 
  Please copy the following traceback and report it at https://github.com/ewels/MultiQC/issues 
  If possible, please include a log file that triggers the error - the last file found was:
    ./rseqc/gene_body_coverage/Sample_S1Aligned.sortedByCoord.out.rseqc.txt.geneBodyCoverage.txt
============================================================
Module rseqc raised an exception: Traceback (most recent call last):
  File "~/miniconda3/envs/nf-core-rnaseq-1.1/bin/multiqc", line 440, in multiqc
    output = mod()
  File "~/miniconda3/envs/nf-core-rnaseq-1.1/lib/python2.7/site-packages/multiqc/modules/rseqc/rseqc.py", line 54, in __init__
    n[sm] = getattr(module, 'parse_reports')(self)
  File "~/miniconda3/envs/nf-core-rnaseq-1.1/lib/python2.7/site-packages/multiqc/modules/rseqc/gene_body_coverage.py", line 44, in parse_reports
    self.gene_body_cov_hist_counts[s_name][int(keys[k])] = float(var)
ValueError: invalid literal for int() with base 10: 'count'
============================================================
[WARNING]         multiqc : No analysis results found. Cleaning up..
[INFO   ]         multiqc : MultiQC complete

File that triggers the error:
This are the original files without fixing the lower case issue.
rseqc.tar.gz

MultiQC run details (please complete the following):

  • Command used to run MultiQC: multiqc . -f -m rseqc
  • MultiQC Version: MultiQC v1.6
  • Operating System: Ubuntu 17.10
  • Python Version: Python 2.7.15
  • Method of MultiQC installation: conda

Additional context
I'm using the nf-core-rnaseq-1.1 conda environment.

@ewels ewels added bug: core Bug in the main MultiQC code module: change labels Oct 16, 2018
@ewels
Copy link
Member

ewels commented Oct 16, 2018

Thanks @santiagorevale - I wonder if there's been an update to RSeQC and the log file format has changed. I'll take a look when I can (I'm away currently for 3 weeks so it'll be a while sorry).

You can find the example data that I built the modules around here: https://github.com/ewels/MultiQC_TestData/tree/master/data/modules/rseqc

@santiagorevale
Copy link
Author

Hi @ewels. I believe I've pin pointed the problem. The MultiQC module is expecting the output for the geneBody_coverage.py script instead of the output of geneBody_coverage2.py script, which is the script being used by https://github.com/nf-core/rnaseq.

I believe it should be easy to fix the module to work with the output of both scripts, because the format for geneBody_coverage2.py is similar to that of geneBody_coverage.py version < 2.4. However, I made a few changes to make it work, but though it looks as expected, the Y-axis for the "Percentage" plot doesn't show nor percentage nor proportion and I couldn't understand why. I'll leave that for you for now.

Thanks for looking into this.

Oh, by the way, Liguo Wang (RSeQC's developer) has released version 3.0 of RSeQC, with my changes on the Gene Body Coverage script.

@ewels
Copy link
Member

ewels commented Nov 12, 2018

Aha, great spot - thanks. Will look into updating this to support both ASAP (hopefully will have some time for MultiQC in a few weeks).

If there have been changes to the output, it would be great if you could submit a pull-request to https://github.com/ewels/MultiQC_TestData with some examples of the new format please 😁 (or just attach here if in doubt).

Thanks,

Phil

@ewels ewels added waiting: example data Needs example data before we can proceed and removed bug: core Bug in the main MultiQC code labels Nov 12, 2018
@ewels ewels removed the waiting: example data Needs example data before we can proceed label Jul 4, 2019
@ewels
Copy link
Member

ewels commented Jul 4, 2019

Note to self: original post already had example data:

This are the original files without fixing the lower case issue. rseqc.tar.gz

@ewels ewels added this to the MultiQC v1.8 milestone Nov 13, 2019
@ewels
Copy link
Member

ewels commented Nov 15, 2019

Hi @santiagorevale,

This issue is super old now (sorry). Could I please confirm that the example data is relevant?

Phil

@ewels ewels changed the title RSeQC sub-modules not working RSeQC: Add support for geneBody_coverage2.py Nov 19, 2019
@ewels ewels modified the milestones: MultiQC v1.8, MultiQC v1.9 Nov 19, 2019
@CuriusScientist
Copy link

CuriusScientist commented Dec 5, 2019

I am having the same problem with MultiQC (v1.8) not able to compile results from Junction_annotation.py (version 3.0.1) which I am running as an individual tool and not a part of nf-core

Junction_annotation py
Junction_annotation py results
MultiQC

@ewels
Copy link
Member

ewels commented Dec 18, 2019

Thanks @CuriusScientist - please post example files though, so I can take a look.. (I need to compare to my existing test data).

Phil

@ewels
Copy link
Member

ewels commented Dec 18, 2019

Note to self: Example data for gene body coverage also in #1072

ewels added a commit to MultiQC/test-data that referenced this issue May 30, 2020
ewels added a commit to MultiQC/test-data that referenced this issue May 30, 2020
@ewels
Copy link
Member

ewels commented May 30, 2020

Hi all,

I've just added support for geneBodyCoverage2.py in 2cae7a4

It was fairly trivial in the end, I'm really sorry that it took me so long to make this work (1.5 years?!). Please let me know if you find any problems.

@CuriusScientist - please open a new issue about your problem with Junction_annotation.py and attach some example output files so that I can try to replicate the problem 👍

Thanks!

Phil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants