Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate called after throwing an instance of 'std::bad_array_new_length' #15

Open
HLHsieh opened this issue Jul 1, 2024 · 5 comments

Comments

@HLHsieh
Copy link

HLHsieh commented Jul 1, 2024

Hi Helia,

Thank you again for your suggestions. I applied LongTR to twenty sets of data, and only one set encountered the following issue:

Detected 1 BAM/CRAM files
User-specified read groups for 1 unique samples
Reading region file /scratch/kinfai_root/kinfai0/hsinlun/reference/myDefinedRepeat_LongTR.bed
Region file contains 1 regions

Processing region chr4 190066140 190092504
121 reads overlapped region, of which
	0 were hard clipped
	0 had an 'N' base call
	97 had low MAPQ
	0 had low base quality scores
	20 did not span the STR
	0 did not have a unique mapping
	4 PASSED ALL FILTERS
Phased SNPs add info for 0 out of 4 reads and 0 out of 1 samples
Trimming reads
Generating candidate haplotypes
	TTCCTGGGCATCCCGGGGATCCCAGAGCCGGCCCA GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGC...ACTGCCATTCTTTCCTGGGCATCCCGGGGATCCCAGAGCCGGCCCAG GTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCG
	                                    GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGC...ACTGCCATTCTTTCCTGGGCATCCCGGGGATCCCAGAGCCGGCCCAG
	                                    GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGC...ACTGCCATTCTTTCCTGGGCATCCCGGGGATCCCAGAGCCGGCCCAG
	                                    GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGC...GAACTGCCATTCCCTAGCCATTCGCGGGTCCAGAGCCGGCGCGTTAA
	                                    GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGC...ACTGCCATTCTTTCCTGGGCATCCCGGGGATCCCAGAGCCGGCCCAG
Added 0 inexact haplotypes generated by POA
Aligning reads to each candidate haplotype
terminate called after throwing an instance of 'std::bad_array_new_length'
  what():  std::bad_array_new_length
/var/spool/slurmd.spool/job10211381/slurm_script: line 63: 3269819 Aborted                 (core dumped) $script --bams ${input_dir}/${myseq}.sorted.bam --fasta ${genome} --regions ${predefined} --tr-vcf ${myseq}.vcf.gz --bam-samps ${myseq} --bam-libs ${myseq} --min-mean-qual -1 --min-reads 1 --max-tr-len 500000 --skip-assembly

I have no idea how to fix the error. Any suggestions would be appreciated.
PS, I still use the previous version.

Best,
Hsin

@heliziii
Copy link
Collaborator

heliziii commented Jul 2, 2024

Hi Hsin,

It is difficult to say what exactly happened from the log only, but I see that the repeat is very long, ~25k bp and the error denotes something about the size of an array. Can you please share the repeat information? I'll try to genotype a sample at this locus.

Best,
Helia

@HLHsieh
Copy link
Author

HLHsieh commented Jul 2, 2024

Hi Helia,

I was trying to analyze the same repeat across several datasets, but only one dataset encountered an issue. Here is the repeat information:

chr4	190066141 190092504	3300	8	D4Z4

I have also attached the file that is causing the issue for your reference (https://buckeyemailosu-my.sharepoint.com/:f:/g/personal/hsieh_332_buckeyemail_osu_edu/EuD_CyvayNxNlKdmZGlDQdsBgQoBni_UxuZqVm93mzSIzg?e=D1YGAA).

Thank you for your assistance.

Many thanks,
Hsin-Lun

@heliziii
Copy link
Collaborator

Hi Hsin,

I apologize for the late reply. Would that be possible for you to upload the bam file with reads aligning to this region only? current bam files are a bit large to download.

Best,
Helia

@HLHsieh
Copy link
Author

HLHsieh commented Aug 7, 2024

Hi Helia,

I apologize for missing your reply. I have uploaded the requested file: https://buckeyemailosu-my.sharepoint.com/:f:/g/personal/hsieh_332_buckeyemail_osu_edu/EuD_CyvayNxNlKdmZGlDQdsBgQoBni_UxuZqVm93mzSIzg?e=KJbvMZ

Additionally, regarding the TR region BED file, I have a few questions. Should the NUM_COPIES be an integer? Also, how should I consider a TR motif? For instance, I want to test a VNTR with 4 copies, but the motif lengths are not exactly the same in the reference genome, ranging from 46 to 50 bp due to variants.

Thank you,
Hsin

@heliziii
Copy link
Collaborator

heliziii commented Sep 5, 2024

Hi Hsin,

Sorry for the delayed reply. I will look into the files asap.

For the region BED, NUM_COPIES doesn't need to be an integer. The TR motif sequence doesn't affect the final output in normal setting.

Best,
Helia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants