Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when creating database #41

Open
LankyCyril opened this issue Apr 23, 2020 · 2 comments
Open

Error when creating database #41

LankyCyril opened this issue Apr 23, 2020 · 2 comments

Comments

@LankyCyril
Copy link

Hi. I run the downloadRefSeq.pl command -- downloadRefSeq.pl --seqencesOutDirectory data/metamaps-db/refseq --taxonomyOutDirectory data/metamaps-db/taxonomy, and after about two days of churning data and printing progress output, it just failed with "Cannot change working directory into assembly path na na: No such file or directory" and no other explanation. It had successfully processed all bacterial genomes but only got through 5 out of 323 fungal genomes. Looking into the data/metamaps-db/refseq/fungi dir, I actually see only six subdirectories for six species. assembly_summary.txt lists a lot more. I have about 20TB free disk space left, so it can't be that.

Does it mean that some previous data retrieval steps failed? Is there a way to safeguard against this? Or fix it and resume from where it left off?

@JanMoat
Copy link

JanMoat commented Jul 29, 2022

I fixed the error by changing ftp to https in one line of downloadRefSeq.pl.
Original: (my $assembly_path_FTP = $assembly_path_fullURL) =~ s/ftp:\/\/ftp.ncbi.nlm.nih.gov//g;
New: (my $assembly_path_FTP = $assembly_path_fullURL) =~ s/https:\/\/ftp.ncbi.nlm.nih.gov//g;

There's a similar known problem & fix with Kraken2

@srusher
Copy link

srusher commented May 8, 2024

I added a conditional statement in there that iterates to the next species if $assembly_path_fullURL == "na" - that's why that error was being thrown. I used the following sed command to insert the logic:

sed -i 's|# last SPECIES if($downloaded_assemblies > 100);|if($assembly_path_fullURL eq "na"){\n\t\t\t\tnext SPECIES; \n\t\t\t}\n|g' ./downloadRefSeq.pl

This will replace this comment line # last SPECIES if($downloaded_assemblies > 100); with the following if statement:

if($assembly_path_fullURL eq "na"){ next SPECIES; }

Keep in mind that if there is an update to MetaMaps and the # last SPECIES if($downloaded_assemblies > 100); comment is removed, this sed statement won't work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants