-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: append_alleles_batch #267
Comments
Hi @cindywen96, First, thank you so much for reporting this issue! A few thoughts: Regarding your issue about memory:Internally, haptools must store a genotype matrix (numpy array) of size
Regarding your issue with
|
Thank you so much for your prompt reply. It's very helpful. I was using 0.4.2. I'm now updated to 0.5.0 and added
|
Ah! I'm sorry about that. Hmm... Can you help me replicate the error? Can you share your model file and the version of the 1000 Genomes reference that you're using? Also, can you share the command that you're running? |
Sure, here is the command. 1KG ref is a subset of variants on chr1, hg38
And here is test data |
Ah, ok. In that case, can you also share the pvar and psam files that correspond with the pgen file? Those are also used by haptools whenever a pgen file is provided to it. |
Sure, please see additional files attached. |
Ok, thanks! I ran it on my machine and got a different error.
Your PVAR file doesn't seem to correspond with your PGEN file. Can you double-check it?
|
My bad, please use this pgen file attached. plink file contains 166930 variants. |
@cindywen96, thank you so much for reporting this issue! I was able to replicate it. It seems like there are actually multiple problems:
I'll continue to debug issue 2 and get back to you ASAP once I've figured it out. @mlamkin7, in case I can't figure it out on my own, here are some steps for us to more quickly replicate issue 2 (bypassing the simulation step):
|
@cindywen96, can you download the newest version of the code and try it again?
Last night, @mlamkin7 pushed some code in e3db851 which should fix issue 2. What we noticed is that there are variants in your reference file with coordinates past the last coordinate in the map file. Our code had previously been unable to handle that. The new code fixes this by writing the coordinate of the last breakpoint in the haptools/haptools/sim_genotype.py Lines 369 to 372 in e3db851
|
Hi I'm using simgenotype to generate N=10k admix samples from 1KG reference. The breakpoint simulation completes immediately but writing the
admix.pgen
file takes lots of memory and I keep getting the following error. Also it seems the number of written variants change with the number of reference variants. Could you advise on how much memory it takes and is there any limit in number of variantspgen.append_alleles_batch
could handle? Thank you!The text was updated successfully, but these errors were encountered: