Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

command "paladin prepare -r2" throwing error related to memory ? #40

Open
jaidevjoshi83 opened this issue Mar 14, 2019 · 7 comments
Open

Comments

@jaidevjoshi83
Copy link

jaidevjoshi83 commented Mar 14, 2019

Hi,
Is there any way/hack to run this particular part without encountering this error given below, if you have less memory, in my case 256GB only. Or can I use the pre-prepared or pre-indexed database?

Error: "Constructing BWT for the packed sequence... [is_bwt] Failed to allocate 482146077304 bytes at is.c line 212: Cannot allocate memory"

Kindly suggest.

@davidfbibby
Copy link

Hello,
I get the same error when attempting to index the RVDb prot database. The protein FASTAs are renamed to protein.faa.gz and when I run paladin index -r3 protein.faa.gz, I get the following:

[M::command_index] Translating protein sequence...0.00 sec
[M::command_index] Packing protein sequence... 93.97 sec
[M::command_index] Constructing BWT for the packed sequence... [is_bwt] Failed to allocate 134975224056 bytes at is.c line 212: Cannot allocate memory

Although the protein.faa.gz is large (461Mb), I am working on a large cluster, and am surprised to encounter this problem.
Many thanks for any help you can provide.

Dave

@ToniWestbrook
Copy link
Owner

Hi @davidfbibby - to double check, I just clustered the latest revision of the clustered RVDB (around 3.1 GB of amino acids uncompressed) while profiling the memory usage. The maximum resident size during indexing for this reference is 56GB, but as can be seen above, it actually allocates a larger amount to work in (128GB) so you'll need at least that much (system memory and/or job constraint wise) to complete the indexing process. Does your system have that much memory?

@davidfbibby
Copy link

Hi,
I was using the unclustered dataset, which is over 8Gb! Maybe I should try to use the clustered version...
I'm not sure about my available memory tbh, but if the clustered version fails, I'll enquire.

Thanks for the quick response,

Dave

@davidfbibby
Copy link

Another question - on https://rvdb-prot.pasteur.fr/, it is only the unclustered dataset that I can find.

@ToniWestbrook
Copy link
Owner

ToniWestbrook commented Apr 12, 2022

Here's the link to the group that maintains the clustered RVDB: https://rvdb.dbi.udel.edu/ (that has both the clustered and unclustered references available). Indexing the unclustered DB would need significantly more memory, so it would be good to use the clustered first if that works okay for your purposes. Hope that helps

@ToniWestbrook
Copy link
Owner

ToniWestbrook commented Apr 12, 2022

Apologies, that's a nucleotide version of the reference at that link! I totally missed that when I downloaded it yesterday. I'll take a look around for a clustered version of the protein database - if not, you may have to cluster it yourself to fit into memory. Sorry again

@davidfbibby
Copy link

Ooof. I don't fancy clustering them. I'll see if I can get more memory to allocate...
Thanks again for the quick responses.

Dave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants