-
Notifications
You must be signed in to change notification settings - Fork 10
MetaProteomeAnalyzerCLI
The command line interface (CLI) for the MetaProteomeAnalyzer (MPA), referred to as MetaProteomeAnalyzerCLI, can be used to execute the MPA software in a command line setting, for example for the use on a Linux server or as part of another software pipeline/workflow.
MetaProteomeAnalyzerCLI takes one or multiple spectrum files as input and uses X!Tandem, Comet and MS-GF+ algorithms to perform database searching according to the given search parameters. In addition, MPA features several post-processing steps, such as FDR estimation, protein hit grouping (so-called metaprotein generation), taxonomic and functional analysis.
Please note that MS/MS spectra must be provided in Mascot Generic File (MGF) and protein sequence databases in FASTA format.
Java command line execution:
java -cp mpa-portable-X.Y.Z.jar de.mpa.cli.CmdLineInterface [parameters]
Mandatory parameters:
-spectrum_files Spectrum files (MGF format), comma separated list for multiple files. Example: "file1.mgf, file2.mgf".
-database FASTA database file against which is searched.
-missed_cleav The number of maximum allowed missed cleavages.
-prec_tol The precursor tolerance (in Dalton, e.g. 0.5Da or PPM, e.g. 10ppm).
-frag_tol The fragment ion tolerance (in Dalton, e.g. 0.5Da).
-output_folder The output folder for exporting the processed results.
Optional parameters:
-xtandem Turn database search algorithm X!Tandem on or off (1: on, 0: off, default is '1').
-comet Turn database search algorithm Comet on or off (1: on, 0: off, default is '0').
-msgf Turn database search algorithm MS-GF+ on or off (1: on, 0: off, default is '0').
-generate_metaproteins Turn meta-protein generation (aka. protein grouping) on or off (1: on, 0: off, default is '1').
-peptide_rule The peptide rule chosen for meta-protein generation (-1: off, 0: share-one-peptide, 1: shared-peptide-subset, default is '0').
-cluster_rule The sequence cluster rule chosen for meta-protein generation (-1: off, 0: UniRef100, 1: UniRef90, 2: UniRef50, default is '-1').
-taxonomy_rule The taxonomy rule chosen for meta-protein generation (-1: off, 0: on superkingdom or lower, 1: on kingdom or lower, 2: on phylum or lower, 3: on class or lower, 4: on order or lower, 5: on family or lower, 6: on genus or lower, 7: on species or lower, 8: on subspecies, default is '-1').
-iterative_search Turn iterative (aka. two-step) searching on or off (-1: off, 0: Protein-based, 1: Taxon-based, default is '-1')
-fragment_method The fragmentation method chosen for the MS instrument (1: CID, 2: HCD, 3: ETD, default is '1' - CID).
-peptide_index Turn peptide indexing (of FASTA database) on or off (1: on, 0: off, default is '1').
-fdr_threshold The applied FDR threshold for filtering the results (default is 0.05 == 5% FDR).
-threads The number of threads to use for the processing (default is the number of cores available).
Conda command line execution:
mpa-portable de.mpa.cli.CmdLineInterface [parameters]
Please note that MPA Portable must have been previously installed as conda package using the bioconda channel:
conda install mpa-portable -c bioconda
When using comma separated lists as input for the mgf files please pay attention to the quotes required. Surround the full content of the option in quotes and not the individual items:
-spectrum_files "C:\..\file_1.mgf, C:\..\file_2.mgf" [other parameters]
Here is a minimum working example for the Windows operating system. X, Y and Z have to be replaced by the actual version of MetaProteomeAnalyzer (portable) software and my folder by the folder containing the desired files:
java -cp mpa-portable-X.Y.Z.jar de.mpa.cli.CmdLineInterface
-spectrum_files C:\my_folder\spectrum_file.mgf
-database C:\my_folder\uniprot_sprot.fasta
-missed_cleav 1
-prec_tol 10ppm
-frag_tol 0.5Da
-output_folder C:\my_folder\output
For sake of readability, the input parameters are split over multiple lines. When using the command line, however, all parameters should be included as single line.
Here is an extended example for the Linux operating system featuring all optional parameters explicitely. In this setup, X!Tandem, Comet and MS-GF+ are employed (using 8 threads) for protein identification, iterative searching is turned off, an FDR threshold of 1% is applied and proteins are grouped based on the meta-protein rule of requiring a single shared peptide. Both taxonomy and cluster rule are turned off.
java -cp mpa-portable-X.Y.Z.jar de.mpa.cli.CmdLineInterface
-spectrum_files /home/my_folder/spectrum_file.mgf
-database /home/my_folder/uniprot_sprot.fasta
-missed_cleav 1
-prec_tol 10ppm
-frag_tol 0.5Da
-xtandem 1
-comet 1
-msgf 1
-iterative_search 0
-fdr_threshold 0.01
-generate_metaproteins 1
-peptide_rule 0
-cluster_rule -1
-taxonomy_rule -1
-threads 8
-output_folder /home/my_folder/output/