Skip to content

de novo Generate HMMs

Simonas Marcišauskas edited this page Apr 12, 2021 · 2 revisions

de novo Generate Hidden Markov Models

The users who have their own KEGG FTP Subscription can use KEGG FTP dump files as input in getKEGGModelForOrganism. This provides an ability to reconstruct the model from the latest KEGG version available at the time while optimizing all the parameter settings in getKEGGModelForOrganism for the best result. The user should therefore firstly delete all kegg***.mat files from RAVENdir/external/kegg and put the following files into the same directory:

  • reaction. Can be retrieved from /kegg/ligand/reaction.tar.gz.
  • reaction.lst. Can be retrieved from /kegg/ligand/reaction.tar.gz.
  • reaction_mapformula.lst. Can be retrieved from /kegg/ligand/reaction.tar.gz.
  • compound. This file should be concatenated from the two source files. The first file is compound and can be retrieved from /kegg/ligand/compound.tar.gz. The second file is glycan and can be retrieved from /kegg/ligand/glycan.tar.gz.
  • compound.inchi. Can be retrieved from /kegg/ligand/compound.tar.gz.
  • ko. Can be retrieved from /kegg/genes/ko.tar.gz.
  • genes.pep. This file should be concatenated from the two source files. The first file is eukaryotes.pep and can be retrieved from /kegg/genes/fasta/eukaryotes.pep.gz. The second file is prokaryotes.pep and can be retrieved from /kegg/genes/fasta/prokaryotes.pep.gz.
  • taxonomy. Can be retrieved from /kegg/genes/misc/taxonomy.

Once all the files are in place, the user can immediately run the reconstruction, e.g.:

model=getKEGGModelForOrganism('hsa','inputFasta.fa','inputDirectory','outputDirectory',true,true,true,true,10^-50,0.8,0.3,-1,inf,1);