Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predict copy number variation (amplifications) #208

Open
jeffreybarrick opened this issue Sep 20, 2019 · 13 comments
Open

Predict copy number variation (amplifications) #208

jeffreybarrick opened this issue Sep 20, 2019 · 13 comments

Comments

@jeffreybarrick
Copy link
Contributor

jeffreybarrick commented Sep 20, 2019

Goal: Predict large duplications/amplifications.

How?
Update/complete existing circular binary segmentation code (C++)
and/or
Implement a function that compares across multiple output folders

Possible test data:

  1. Prophage region in LTEE
  2. CRAphi in Acinetobacter
@seric17
Copy link

seric17 commented Oct 4, 2019

Working on it.

@jeffreybarrick
Copy link
Contributor Author

@seric17: here is a download of test data. Beware: the files are very big!

http://barricklab.org/release/tmp/cnv/Ara-3_coverage.tgz

@jeffreybarrick
Copy link
Contributor Author

Investigate this R package, which does GC normalization and an HMM approach based on gene locations
https://cran.r-project.org/web/packages/CNOGpro/index.html

Need to do normal mapping via bowtie2 that assigns reads to one position in the genome, rather than using the breseq generated bam file.

@he-hai
Copy link

he-hai commented Sep 9, 2020

Hi, I was wondering if this has already been implemented in recent versions. I found this will be quite useful. Thank you~

@jeffreybarrick
Copy link
Contributor Author

@he-hai Not yet. We're hoping to make some progress on it though.

@miniluphy
Copy link

Is it available on version 0.37.1? I found it in breseq mannal by command "breseq -h". And it did work. However, I am a little confused about which file was the result of the CNV analysis.
The files located in 09_copy_number_variation are as follows:

  • 1.cn_evidence.gd
    1.history.tab
    1.ranges.tab
    1.tiled_for_edging.tab
    1.tiled.tab
    2.cn_evidence.gd
    2.history.tab
    2.ranges.tab
    2.tiled_for_edging.tab
    2.tiled.tab
    3.cn_evidence.gd
    3.history.tab
    3.ranges.tab
    3.tiled_for_edging.tab
    3.tiled.tab
    4.cn_evidence.gd
    4.history.tab
    4.ranges.tab
    4.tiled_for_edging.tab
    4.tiled.tab
    5.cn_evidence.gd
    5.history.tab
    5.ranges.tab
    5.tiled_for_edging.tab
    5.tiled.tab
    6.cn_evidence.gd
    6.history.tab
    6.ranges.tab
    6.tiled_for_edging.tab
    6.tiled.tab
    copy_number_variation.done

I really appreciate your efforts.

@jeffreybarrick
Copy link
Contributor Author

Do not use the --cnv option!!

It's not fully implemented or useful at this point. It's one of those things we hope to get to one day when the right researcher or collaborator shows up. We also don't know of other programs out there that do a great job at this (so share them here if you do).

If you only have a few genomes, you can manually screen for changes in copy number by using breseq BAM2COV in tiling mode. It will generate many graphics files with the genomic average coverage shown. Run breseq BAM2COV to show these advanced options.

We tend to use something like this (run from the main output directory of breseq results):

breseq BAM2COV -a -1 -s 3 --tilesize 20000 tile-overlap 2000

@miniluphy
Copy link

Thank you for your prompt reply and suggestion. Thanks a lot!

@miniluphy
Copy link

When I tried this command:
breseq BAM2COV -a -1 -s 3 --tile-size 20000 --tile-overlap 2000

Some problems occurred:
COMMAND: BAM2COV
Plotting coverage...
Region : 1:1-21000
File : 1:1-21000.png
Error in `$<-.data.frame`(`tmp`, unique_tot_cov, value = integer(0)) :
replacement has 0 rows, data has 600
Calls: $&lt;- -&gt; $&lt;-.data.frame
Execution halted
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!> FATAL ERROR <!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Error running command:
[system] R --vanilla < "/home/dingq/miniconda3/envs/breseq/bin/../share/breseq/plot_coverage.r" > "/tmp/25549.r.log" --args in_file="/tmp/25549_0.coverage.tab" out_file="1:1-21000.png" pdf_output=0 total_only=1 window_start=1 window_end=21000 avg_coverage=58.2 fixed_coverage_scale=174.645
Result code: 256
FILE: libbreseq/common.h LINE: 1411
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!> STACK TRACE <!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Backtrace with 7 stack frames.
breseq(+0x44cf4) [0x556a1da32cf4]
breseq(+0xb98df) [0x556a1daa78df]
breseq(+0x78fd2) [0x556a1da66fd2]
breseq(+0x36b59) [0x556a1da24b59]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f7556d8cd90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f7556d8ce40]
breseq(+0x40559) [0x556a1da2e559]
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

This error is over my head. Thank you for your help~

@jeffreybarrick
Copy link
Contributor Author

Try running just this command and post the output:

R --vanilla < "/home/dingq/miniconda3/envs/breseq/bin/../share/breseq/plot_coverage.r" > "/tmp/25549.r.log" --args in_file="/tmp/25549_0.coverage.tab" out_file="1:1-21000.png" pdf_output=0 total_only=1 window_start=1 window_end=21000 avg_coverage=58.2 fixed_coverage_scale=174.645

@miniluphy
Copy link

Thank you for your prompt reply. I tried the command,then the output is as follows:

Error in file(file, "rt") : cannot open the connection
Calls: read.table -> file
In addition: Warning message:
In file(file, "rt") :
cannot open file '/tmp/25549_0.coverage.tab': No such file or directory
Execution halted

@jeffreybarrick
Copy link
Contributor Author

Try running it without the -t option. There's a bug with that, which is fixed on GitHub 592dd7c but not in the current release yet.

@miniluphy
Copy link

I got it. Thank you for your attention to this matter!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants