Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: any precomputed hmmer msa database? #56

Open
sky1ove opened this issue Nov 15, 2024 · 5 comments
Open

Question: any precomputed hmmer msa database? #56

sky1ove opened this issue Nov 15, 2024 · 5 comments
Labels
question Further information is requested third party tool Issue with a third party tool

Comments

@sky1ove
Copy link

sky1ove commented Nov 15, 2024

To generate MSA (the data pipeline), it takes extremely long time (>15min-20min) on my 32CPU (so n_cpu is 8) for a single protein (around 190 amino acid length). Do you know if this is common? Is there any precomputed hmmer MSA database? Can I use MMseqs2 to precompute the MSA?

@smg3d
Copy link

smg3d commented Nov 16, 2024

I have used the Colabfold pipeline to generate the MSA, and used that MSA in my input.json for AF3. Quick visual inspection of the predictions look similar to the one obtained from the AF3-MSA pipeline. But I have yet to do an in-depth comparison of both MSAs and their effect on predictions.

@James-lin9
Copy link

I have used the Colabfold pipeline to generate the MSA, and used that MSA in my input.json for AF3. Quick visual inspection of the predictions look similar to the one obtained from the AF3-MSA pipeline. But I have yet to do an in-depth comparison of both MSAs and their effect on predictions.

Hi! Did you specify templates in your input.json (Colabfold msas)?

@smg3d
Copy link

smg3d commented Nov 17, 2024

Hi! Did you specify templates in your input.json (Colabfold msas)?

I have not used templates so far with an external MSA. So I just specified an empty template list (otherwise it will search for templates) :

    "unpairedMsa": ">seq1\nMYSEQ\n>seq2\nMSASEQ\n>seq3\nMSASEQ ...",
    "pairedMsa": "",
    "templates": []

@James-lin9
Copy link

Hi! Did you specify templates in your input.json (Colabfold msas)?

I have not used templates so far with an external MSA. So I just specified an empty template list (otherwise it will search for templates) :

    "unpairedMsa": ">seq1\nMYSEQ\n>seq2\nMSASEQ\n>seq3\nMSASEQ ...",
    "pairedMsa": "",
    "templates"

Hi! Did you specify templates in your input.json (Colabfold msas)?

I have not used templates so far with an external MSA. So I just specified an empty template list (otherwise it will search for templates) :

    "unpairedMsa": ">seq1\nMYSEQ\n>seq2\nMSASEQ\n>seq3\nMSASEQ ...",
    "pairedMsa": "",
    "templates": []

Thanks for the reply! Alphafold3 seems doesn't support template search on external MSA. Not sure how it will affect prediction results. I am running 100 predictions on both methods (colabfold and jackhmmer) for comparison. hope can get similar results.

@Augustin-Zidek Augustin-Zidek added question Further information is requested third party tool Issue with a third party tool labels Nov 18, 2024
@Augustin-Zidek
Copy link
Collaborator

To generate MSA (the data pipeline), it takes extremely long time (>15min-20min) on my 32CPU (so n_cpu is 8) for a single protein (around 190 amino acid length).

Do you know if this is common?

Are your databases on a fast disk? Either an SSD or even better in a RAM-disk.

Is there any precomputed hmmer MSA database?

Sorry, we don't provide precomputed HMMER MSA databases as jackhmmer/nhmmer don't support these. You could set up a hmmpgmd server that allows precomputed MSA database.

Can I use MMseqs2 to precompute the MSA?

Yes, you can, but we have not validated this setup so we can't make any accuracy guarantees about it. I agree with the advice given in the other comments.

Alphafold3 seems doesn't support template search on external MSA. Not sure how it will affect prediction results. I am running 100 predictions on both methods (colabfold and jackhmmer) for comparison. hope can get similar results.

Yes, you have to provide templates yourself in such case. In most cases, especially with deep MSA, it should not matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested third party tool Issue with a third party tool
Projects
None yet
Development

No branches or pull requests

4 participants