.. _cmdline: ModPipe command line tools ************************** ModPipe is usually run from the command line by running the `modpipe` command line tool, found in the `bin` directory. For example, a :ref:`typical run ` consists of first running the :ref:`addseqmp_cmd` command to populate the ModPipe filesystem with input FASTA sequences, then running :ref:`modpipe_cmd` for each sequence to find hits and build models. Finally, the :ref:`gathermodmp_cmd` command can be used to select the best model(s) from the run. Other commands are provided to accomplish other tasks, such as manipulating alignments or benchmarking the ModPipe system. Common options -------------- To get help on a given command, use `modpipe help `. For example, `modpipe help add` shows the options supported by the 'add' command. .. _addseqmp_cmd: modpipe add ----------- This command will, given a FASTA file containing one or more sequences, populate the :ref:`ModPipe filesystem ` ready for running :ref:`modpipe_cmd`. It will also create a mapping file that maps FASTA identifiers to ModPipe sequence IDs (see :ref:`unqfile`). .. _modpipe_cmd: modpipe build ------------- **Required arguments** :option:`--conf_file` The name of the configuration file (typically, :file:`modpipe.conf`) must be provided. :option:`--sequence_id` The identifier of the sequence to be modelled must be provided, as created by :ref:`addseqmp_cmd`. **Optional arguments** .. _hits_mode: :option:`--hits_mode` Option that specifies the search methods used to find templates ("hits") matching the input ("target") sequence. See :ref:`fold_assignment_methods` for full details on each method. The following methods can be requested: * Seq-Seq: Sequence-Sequence search. * Prf-Seq: Profile-Sequence search using MODELLER :meth:`Profile.build` profiles. * PSI-Blast-Prf-Seq: Profile-Sequence search using PSI-BLAST profiles. * Prf-Prf: Profile-Profile search using MODELLER :meth:`Profile.build` profiles. * PSI-Blast-Prf-Prf: Profile-Profile search using PSI-BLAST profiles. * Seq-Prf: Sequence-Profile search. * Max-PSSM-Seq-Prf: Sequence-Profile search with Max-PSSM scoring. * Max-Freq-Seq-Prf: Sequence-Profile search with Max-frequency scoring. Multiple methods can be requested by several hits_mode statements, or a comma-separated list. "Seq-Seq" will be always added, regardeless of user-input. The default for :option:`--hits_mode` is "Seq-Seq,Seq-Prf". Each search method specified will be used independently. See also the *CLUSTERALI* variable in the :ref:`configuration file ` if you use multiple methods. .. _template: Typically used for template based calculations: If present, creates models only if this PDB ID and chain (4abcA) is among hits from the template databases; also insures that the input template remains in the hits pool after clustering. .. _template_option: :option:`--template_option` Two different types of use: (i) when ModPipe.pl is called in a regular sequence based calculation: * ALL: all available templates are considered * TOP: Only the top hits are considered, starting with the highest sequence identity template, and including all templates with no more than 30% lower sequence identity than the highest, if the sequence identity is higher than 20%, for each sequence region. (ii) when ModPipe.pl is called in a template based calculation: * ALL: all available templates are considered * TEMPLATE: only the template using the option --template is considered. * TEMPLATE_FAST: only the profile of the input template is used, thus effectively disabling meaningful value statistics and MPQS For this option, the shortened PDB95 files should be specified in the configuration file. The option just disables the computation of statistics Used for speed purposes. .. _gathermodmp_cmd: modpipe gather -------------- This command is designed to be run after the main :ref:`modpipe_cmd` command. It parses all of the generated models in the :ref:`models file ` and generates a final models file that contains the 'best' models (in the same YAML format as the input models file). This can be done either for a single sequence (:option:`--seq_id` option) or for all sequences in the :ref:`unique mapping (unq) file ` (:option:`--unq_file` option). One or more selection methods (e.g. pick the model generated from the highest identity template, or that has the best DOPE score) can be specified with the :option:`--final_models_by` option. The final models file will contain a single model per method, or potentially fewer if multiple methods select the same model (e.g. the model with the highest identity template also happens to have the best DOPE score). The `gather` command will generally select only models with good scores. The following criteria are used: * GA341: score>=0.7 * z-DOPE: score<0 * MPQS: score>1.0 * SEQID: no threshold, model with highest sequence identity will be selected * LONGEST_GA341: longest models with score>=0.7 * LONGEST_DOPE: longest model with z-DOPE<0 * TSVMOD: predicted native overlap (3.5) >0.4 * INPUT_TEMPLATE: model with highest sequence identity using input template, used in template based calculations. It is often the case that no template spans the whole query sequence - for example, in a two-domain system, one set of templates may yield models for the first domain and another set may yield models for the second domain, while no template covers both domains. Model selection may then pick a good model for one domain, discarding the other domain models. In this case, therefore, it may make sense to turn on :option:`--select_by_region`. This will cluster the models by region and then apply the selection criteria to each region individually rather than only to the whole sequence. Thus, at most one model per selection method per region will be returned. .. _benchmark_cmd: modpipe benchmark ----------------- This command is used to benchmark the performance of the ModPipe system by comparing generated models for a sequence to a known structure (usually a PDB crystal structure) of the same sequence. The command, given a sequence identifier in the ModPipe filesystem, parses the :ref:`models file ` and, for each model in that file, compares it to the native structure and writes out a new YAML file that is similar to the original models file but which contains an extra 'native_benchmark' field for each model, containing the benchmark data. The sequence is identified using the :option:`--conf_file` and :option:`--sequence_id` options. The PDB code and chain ID of the known structure is specified using the :option:`--native` and :option:`--native_chain` options (use the :option:`--pdb_repository` option to specify a list of directories to search for PDB files). Finally, the :option:`--output_filename` option is used to name the output file containing model and benchmark data. The benchmark data in the output file looks similar to:: native_benchmark: code: 1abc chain: A length: 116 region: ['1', '116'] cutoff_rms: - {cutoff: 1.0, num_equiv_pos: 115, rms: 0.037} - {cutoff: 2.0, num_equiv_pos: 115, rms: 0.037} - {cutoff: 3.0, num_equiv_pos: 116, rms: 0.229} - {cutoff: 4.0, num_equiv_pos: 116, rms: 0.229} - {cutoff: 5.0, num_equiv_pos: 116, rms: 0.229} mean_cutoff_rms: 0.152 mean_num_equiv_pos: 115 cutoff_rms_35: 0.229 num_equiv_pos_35: 116 global_rms: 0.229 The benchmarking data is created using Modeller's :meth:`Selection.superpose` method; a simple 1:1 alignment (no gaps) is created between the model sequence and the native structure, using C-alpha atoms to define each residue's spatial position. The fields reported are: code The PDB code of the native structure. chain The PDB chain ID of the native structure. length The number of residues in the model. region The starting and ending PDB residue numbers in the native structure that correspond to the model. (The model may not cover the entire native chain.) cutoff_rms A list of comparisons between the model and native structure. Each row is a superposition using a different cutoff to :meth:`Selection.superpose`. For each cutoff, the number of equivalent positions (number of model residues within the cutoff distance from the same residue in the superposed native structure) is reported. The root-mean-square deviation of model residue positions from the native positions is also reported, for all residues that are within the cutoff distance. mean_cutoff_rms, mean_num_equiv_pos The average of the rms and num_equiv_pos values for all rows in cutoff_rms. cutoff_rms_35, num_equiv_pos_35 The equivalent of the rms and num_equiv_pos values from the cutoff_rms table, for a 3.5 angstrom cutoff. global_rms The root-mean-square deviation of model residue positions from native positions, for all residues in the structure. .. _convertseq_cmd: modpipe convert --------------- This tool will convert an input alignment file to a different format. Since ModPipe uses exclusively FASTA format for the input sequence (see :ref:`addseqmp_cmd`) this is a useful tool if your sequence is in another format, such as PIR.