nanorepeat "freezing" #18

fansalon · 2024-12-13T08:47:06Z

Hi there,

I am running nanorepeat on some ONT bam files with the following command:

nanoRepeat.py -i $bam -t bam -d ont_sup -r $genome -b $bed -c 20 -o $output

where $bed contains 6,841 entries and $bam is a ~100G file.

The issue I am experiencing is that after some hours of running, nanorepeat seems to freeze. While at the beginning of the run I could see in the output folder tmp subfolders like ${sample}.NanoRepeat_temp.chr8-20707547-20707710-AAGA, now in the output folder there is only a subfolder named ${sample}.details. However, the program is still running, the last operation appears to have been done more than 6 hours ago and no *tsv file has been generated.

I have tried to repeat the analysis on a SR bed file containing 10 entries and everything run smoothly. Might this perhaps be linked to the big size of BAM (~100G) and BED (7k entries) files?

I have increased the memory up to 500gb, but nothing changed.

Do you have any idea on what is going on?

Thanks in advance,
Federico

The text was updated successfully, but these errors were encountered:

fangli80 · 2024-12-13T17:32:33Z

Hi Federico,
Thanks for your feedback. NanoRepeat might be stuck in a specific repeat region. Can you locate this region? You can find it in the stderr output.

like this:

By the way, please try to use -c 8 instead of -c 20. When using more than 8 threads, the speed improvement is minimal, but the I/O load is very high.

fansalon · 2024-12-14T14:49:08Z

Hi,

thanks a lot for the feedback and suggestion, I am now running with -c 8. It indeed seems that nanorepeat gets stuck at a specific repeat. It has been stuck at the repeat chrY-87448334-87448375-A for 16h, see below:

[12/14/2024 00:40:07] NOTICE: [Process 04] Quantifying repeat: chrY-87448334-87448375-A
[12/14/2024 00:40:07] NOTICE: [Process 04] Step 1: finding anchor location in reads
[12/14/2024 00:40:07] NOTICE: [Process 04] Step 2: round 1 and round 2 estimation
[12/14/2024 00:40:08] NOTICE: [Process 04] Step 3: round 3 estimation
[12/14/2024 00:40:14] NOTICE: [Process 04] Step 4: phasing reads using GMM

I could certainly try to remove this repeat from my input file and re-run it. I have no problem in doing that, but I was wondering whether you have a guess on why this happens. Is this random or you think this particular repeat might be problematic?

Thanks a lot,
Federico

fangli80 · 2024-12-14T15:11:23Z

I'm not sure because I haven't seen the reads in this repeat. It might be due to an high sequencing depth, or these reads might contain very long repeats. Is it convenient for you to check the region in IGV? (you can use samtools view to extract the reads of a specific region so that you dont' need to download the whole bam file from the server)

By the way, it looks like chrY-87448334-87448375-A is finished. It might be another one. It is a little bit difficult to locate the problematic repeat because there are 20 parallel processes and the stderr output of these parallel processes are mixed. But you can separate them using the process id (e.g. [Process 04] )

Thanks,
Li

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nanorepeat "freezing" #18

nanorepeat "freezing" #18

fansalon commented Dec 13, 2024

fangli80 commented Dec 13, 2024

fansalon commented Dec 14, 2024 •

edited

Loading

fangli80 commented Dec 14, 2024 •

edited

Loading

nanorepeat "freezing" #18

nanorepeat "freezing" #18

Comments

fansalon commented Dec 13, 2024

fangli80 commented Dec 13, 2024

fansalon commented Dec 14, 2024 • edited Loading

fangli80 commented Dec 14, 2024 • edited Loading

fansalon commented Dec 14, 2024 •

edited

Loading

fangli80 commented Dec 14, 2024 •

edited

Loading