Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSG: Seq [contigxxxx]: Terminator codon inside CDS! #203

Closed
blancaverag opened this issue Nov 1, 2016 · 20 comments
Closed

MSG: Seq [contigxxxx]: Terminator codon inside CDS! #203

blancaverag opened this issue Nov 1, 2016 · 20 comments
Assignees

Comments

@blancaverag
Copy link

I am trying to annotate metagenomic contigs with prokka. For that, I have joined bacteria, archaea and virus databases and run the prokka command. The computation runs fine until:

[05:38:20] Found 71830 CDS
[05:38:20] Connecting features back to sequences
[05:38:20] Not using genus-specific database. Try --usegenus to enable it.
[05:38:20] Annotating CDS, please be patient.
[05:38:20] Will use 8 CPUs for similarity searching.

--------------------- WARNING ---------------------
MSG: Seq [contig01018]: Terminator codon inside CDS!
...

What could be wrong?

@tseemann
Copy link
Owner

tseemann commented Nov 1, 2016

@blancaverag what version of Bioperl are you using? send me the whole .log file if possible.

@tseemann tseemann self-assigned this Nov 1, 2016
@blancaverag
Copy link
Author

Hi,

I have BioPerl 1.007000.
The problem arises when I add the --metagenome option. Without it, it just runs perfectly.
I am attaching the log file and the screen message, as the error does not appear in the log file.
PROKKA_11012016_log.txt
err_file.txt

@regelka
Copy link

regelka commented Nov 14, 2016

Same problem here...

Prokka_log.txt

@mherold1
Copy link

mherold1 commented Nov 14, 2016

I have noticed before that prodigal sometimes predicts CDS that contain stop codons, e.g. :

>PROKKA_00425 hypothetical protein
MTREKGYAYNRRFCASGAGRNNNEQFSKSQQWFGQDKQHSTLNHNFNKIFSIGSGCGQTE
FFFPA*HIALAVSGNCKPDNQNNQTLPKHEKFQKIHS*VFALSDCNSTVRSRVLEYLFWS
*CKSYFLSKSSCSYQFDLAVFITISTYSYRKETKRKS

@alneberg
Copy link

I've seen the same as @mherold1 when ran in the --metagenome mode.
I believe in our case it's because prodigal will disregard the -g flag when the metagenome mode is used.
Prodigal thus gives some genes from a different genetic code, and this information is lost since prokka only picks up the nucleotide sequence form prodigal.

Does that make sense?

@spock
Copy link

spock commented Nov 18, 2016

@alneberg , prokka does get only start/end/strand information from prodigal for each CDS.

@alneberg
Copy link

@spock, exactly, so then if prodigal have created a gene past a regular stop codon, there will be a stop codon in the amino acid sequence when prokka translates the nucleotide sequence with the original intended genetic code.

@arielamadio
Copy link

Could this be related with the issue I've added about selenocysteine?

@maureenbug
Copy link

I'm getting the same warning, and some of my samples have ~40% of the detected genes with internal stop codons... this seems quite high to me.

@willnotburn
Copy link

willnotburn commented Jun 28, 2018

1.5 years late to this party. I understand what @alneberg is saying. OK, but then what? How does prokka resolve this inconsistency with prodigal? Should I still use the --metagenome mode? Using prokka 1.13

@tseemann
Copy link
Owner

tseemann commented Jul 1, 2018

Prokka was never intended to me used for metagenomes.

I did not know the Prodigal reported genes from different genetic codes!
That would be the source of the problem.

What exact version of prodigal are you using?
I think Homebrew and maybe Conda recently updated.

@willnotburn
Copy link

willnotburn commented Jul 1, 2018

prokka log file says Prodigal version 2.6. Should it be a different one? I used the conda install just last week.

@tseemann
Copy link
Owner

tseemann commented Sep 1, 2018

@willnotburn Can you find the exact version?

% prodigal

-------------------------------------
PRODIGAL v2.6.3 [February, 2016]
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.
-------------------------------------

Usage:  prodigal [-a trans_file] [-c] [-d nuc_file] [-f output_type]
                 [-g tr_table] [-h] [-i input_file] [-m] [-n] [-o output_file]
                 [-p mode] [-q] [-s start_file] [-t training_file] [-v]
         -g:  Specify a translation table to use (default 11).

@willnotburn
Copy link

@tseemann It is indeed PRODIGAL v2.6.3 [February, 2016]

@tseemann
Copy link
Owner

tseemann commented Sep 8, 2018

So it turns out that prodigal in metagenome mode will try translation tables 4 and 11 and see which does 'better'. So it might choose some contigs to be in table 4, but i don't know which ones are that. So when I go to translate proteins that were done with table 4, but I am using table 11 (from --gcode) then it might sometimes encounter stop codons.

Does prodigal tag which genetic code it uses for each prediction, in any of the 3 output modes?

@JinqunHuang
Copy link

JinqunHuang commented Jun 21, 2019

@tseemann
Hi, Torsten,
WARNING massage "Terminator codon inside CDS" would be reported if I set the "--metagenome" in Prokka 1.13.

I try different parameter combination as below, the WARNING massages still appear.

"prokka --outdir AS100-2 --prefix AS100 --addmrna --metagenome --gcode 11 contigs.fasta.500"

or

"prokka --outdir AS100-3 --prefix AS100 --kingdom Bacteria --metagenome contigs.fasta.500"

Once I do not set the "--metagenome", the WARNING massages do not appear.
My datas are metagenomics datas, could you please have any advises ?

Prodigal V2.6.3
Prokka 1.13

Thanks a lot!
JinQun

@tseemann
Copy link
Owner

tseemann commented Oct 3, 2019

@JinqunHuang ok it seems prodigal ignores genetic code in metagenome mode, because it knows it is a mixture of different things. i can't tell what table prodigal used (?) so i don't know how to translate it correctly.

@tseemann tseemann added the bug label Oct 3, 2019
@cssulliv
Copy link

Hi,

New user here. I just installed Prokka via conda and I too am getting the "Terminator codon inside CDS" message but only for a subset of my MAGs when I include the --metagenome flag. My question is have other folks annotated their MAGs using a default prokka run and if so did it positively or negatively affect any downstream analysis? Any suggestions on this would be helpful :)

@tseemann
Copy link
Owner

tseemann commented Nov 28, 2019

The problem is that --metagenome invokes a special mode of prodigal (the gene predictor tool) which produces predictions which do NOT match --gcode 11. So you get stop codons (relative to 11) but they were predicted assuming some other code. Unfort prodigal does not record or indicate what code it used.

If you have nice contigs and they are all bacterial, don't use --metagenome.

@ScienceAdvances
Copy link

ScienceAdvances commented Apr 19, 2024

use --forece option will produce right *.faa file, but warnning message "Terminator codon inside CDS" still appear.
prokka Bin.01.fna --force --outdir ./ --prefix Bin.01 --metagenome --cpus 20 --kingdom Bacteria

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests