-
Notifications
You must be signed in to change notification settings - Fork 5
wltrimbl/FGS
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Installation ============= To install FragGeneScan, please follow the steps below: 1. Untar the downloaded file "FragGeneScan.tar.gz". This will automatically generate the directory "FragGeneScan". 2. Make sure that you also have a C compiler such as "gcc" and perl interpreter. 3. Run "makefile" to compile and build excutable "FragGeneScan" make clean make fgs Running the program ==================== 1. To run FragGeneScan, ./run_FragGeneScan.pl -genome=[seq_file_name] -out=[output_file_name] -complete=[1 or 0] -train=[train_file_name] [seq_file_name]: sequence file name including the full path [output_file_name]: output file name including the full path [whole_genome]: 1 if the sequence file has complete genomic sequences 0 if the sequence file has short sequence reads [train_file_name]: file name that contains model parameters; this file should be in the "train" directory. Note that four files containing model parameters already exist in the "train" directory. [complete] for complete genomic sequences or short sequence reads without sequencing error [sanger_5] for Sanger sequencing reads with about 0.5% error rate [sanger_10] for Sanger sequencing reads with about 1% error rate [454_5] for 454 pyrosequencing reads with about 0.5% error rate [454_10] for 454 pyrosequencing reads with about 1% error rate [454_30] for 454 pyrosequencing reads with about 3% error rate [illumina_5] for Illumina sequencing reads with about 0.5% error rate [illumina_10] for Illumina sequencing reads with about 1% error rate 2. To test FragGeneScan with a complete genomic sequence, ./run_FragGeneScan.pl -genome=./example/NC_000913.fna -out=./example/NC_000913.test -complete=1 -train=complete [NC_000913.fna]: this sequence file has the complete genomic sequence of E.coli 3. To test FragGeneScan with sequencing reads, ./run_FragGeneScan.pl -genome=./example/NC_000913-454.fna -out=./example/NC_000913-454.test -complete=0 -train=454_10 [NC_000913-454.fna]: this sequence file has simulated reads (pyrosequencing, average length = 400 bp and sequencing error = 1%) generated using Metasim Output ============ Upon completion, FragGeneScan generates three files. 1. The first file is "[output_file_name].out", which lists the coordinates of putative genes. This file consists of five columns (start position, end position, strand, frame, score). For example, >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome 108 440 - 3 1.378688 337 2799 + 1 1.303498 2801 3733 + 2 1.317386 3734 5020 + 2 1.293573 5234 5530 + 2 1.354725 5683 6459 - 1 1.290816 6529 7959 - 1 1.326412 8238 9191 + 3 1.286832 9306 9893 + 3 1.317067 2. The second file is '[output_file_name].ffn", which lists nucleotide sequences corresponding to the putative genes in "[output_file_name].out". For example, >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e nd=338 strand=- GTTGTTACCTCGTTACCTTTGGTCGAAAAAAAAAGCCCGCACTGTCAGGTGCGGGCTTTTTTCTGTGTTTCCTGTACGCGTCAGCCCGCACCGTTACCTG TGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAATGCGTTTCATGGATGTTGTGTACTCTGTAATTTTTATCTGTCTGTGCGCTATGCCTATATTGGT TAAAGTATTTAGTGACCTAAGTCAA >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e nd=2799 strand=+ TTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCC TCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTAT TTTTGCCGAACTTTTGACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTTGCCCAAATAAAACAT GTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCGGCG 3. The third file is '[output_file_name].faa", which lists amino acid sequences corresponding to the putative genes in "[output_file_name].out". For example, >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e nd=338 strand=- VVTSLPLVEKKSPHCQVRAFFCVSCTRQPAPLPVVMVMVVVMVVLMRFMDVVYSVIFICLCAMPILVKVFSDLSQ >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e nd=2799 strand=+ LKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKH VLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLG RNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASRDEDE LPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVG DGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGV ANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKSRRKF LYDTNVGAGLPVIENLQNLLNAGDELMKFSGILSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIEIEP VLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTA AGVFADLLRTLSWKLGV >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=2801 end=3733 strand=+ VKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSAC SVVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCI AHGRHLAGFIHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVADWLGKNYLQNQEGFVHICRLD TAGARVLEN License ============ Copyright (C) 2010 Mina Rho, Yuzhen Ye and Haixu Tang. You may redistribute this software under the terms of the GNU General Public License.
About
FragGeneScan -- hidden Markov model for predicting prokaryotic coding regions
Resources
Stars
Watchers
Forks
Packages 0
No packages published