.. _cmdline:

ModPipe command line tools
**************************

ModPipe is usually run from the command line by running the `modpipe`
command line tool, found in the `bin` directory. For example, a
:ref:`typical run <running_overview>` consists of first
running the :ref:`addseqmp_cmd` command to populate the ModPipe
filesystem with input FASTA sequences, then running
:ref:`modpipe_cmd` for each sequence to find hits and build
models. Finally, the :ref:`gathermodmp_cmd` command can be
used to select the best model(s) from the run. Other commands are provided to
accomplish other tasks, such as manipulating alignments or benchmarking the
ModPipe system.

Common options
--------------

To get help on a given command, use `modpipe help <command>`. For example,
`modpipe help add` shows the options supported by the 'add' command.

.. _addseqmp_cmd:

modpipe add
-----------

This command will, given a FASTA file containing one or more sequences, populate
the :ref:`ModPipe filesystem <filesystem>` ready for running :ref:`modpipe_cmd`.
It will also create a mapping file that maps FASTA identifiers to ModPipe
sequence IDs (see :ref:`unqfile`).

.. _modpipe_cmd:

modpipe build
-------------

**Required arguments**

    :option:`--conf_file` The name of the configuration file (typically,
    :file:`modpipe.conf`) must be provided.

    :option:`--sequence_id` The identifier of the sequence to be modelled
    must be provided, as created by :ref:`addseqmp_cmd`.

**Optional arguments**

.. _hits_mode:

    :option:`--hits_mode` Option that specifies the search methods
    used to find templates ("hits") matching the input ("target") sequence.  
    See :ref:`fold_assignment_methods` for full details on each method.
    The following methods can be requested:

       * Seq-Seq:               Sequence-Sequence search.
       * Prf-Seq:               Profile-Sequence search using MODELLER 
                                :meth:`Profile.build` profiles.
       * PSI-Blast-Prf-Seq:     Profile-Sequence search using PSI-BLAST profiles.
       * Prf-Prf:               Profile-Profile search using MODELLER 
                                :meth:`Profile.build` profiles.
       * PSI-Blast-Prf-Prf:     Profile-Profile search using PSI-BLAST profiles.
       * Seq-Prf:               Sequence-Profile search.
       * Max-PSSM-Seq-Prf:      Sequence-Profile search with Max-PSSM scoring.
       * Max-Freq-Seq-Prf:      Sequence-Profile search with Max-frequency scoring.

    Multiple methods can be requested by several hits_mode statements, or a comma-separated
    list. "Seq-Seq" will be always added, regardeless of user-input. 

    The default for :option:`--hits_mode` is "Seq-Seq,Seq-Prf".

    Each search method specified will be used independently. See also the
    *CLUSTERALI* variable in the :ref:`configuration file <conf>` if you use
    multiple methods.

.. _template:
    Typically used for template based calculations:
    If present, creates models only if this PDB ID and chain (4abcA) is among hits
    from the template databases; also insures that the input template remains in the
    hits pool after clustering. 

.. _template_option:
    :option:`--template_option` Two different types of use:
    (i)  when ModPipe.pl is called in a regular sequence based calculation: 
       * ALL: all available templates are considered
       * TOP: Only the top hits are considered, starting with the highest
         sequence identity template, and including all templates with no more than
         30% lower sequence identity than the highest, if the sequence identity is
         higher than 20%, for each sequence region. 
    (ii) when ModPipe.pl is called in a template based calculation:
       * ALL: all available templates are considered
       * TEMPLATE: only the template using the option --template is considered.
       * TEMPLATE_FAST: only the profile of the input template is used, thus 
         effectively disabling meaningful value statistics and MPQS
         For this option, the shortened PDB95 files should be specified in the 
         configuration file. The option just disables the computation of statistics
         Used for speed purposes. 

.. _gathermodmp_cmd:

modpipe gather
--------------

This command is designed to be run after the main :ref:`modpipe_cmd` command.
It parses all of the generated models in the
:ref:`models file <models_file_format>` and generates a final models file
that contains the 'best' models (in the same YAML format as the input models
file). This can be done either for a single
sequence (:option:`--seq_id` option) or for all sequences in the
:ref:`unique mapping (unq) file <unqfile>` (:option:`--unq_file` option).

One or more selection methods (e.g. pick the model generated from the highest
identity template, or that has the best DOPE score) can be specified with the
:option:`--final_models_by` option. The final models file will contain a single
model per method, or potentially fewer if multiple methods select the same
model (e.g. the model with the highest identity template also happens to have
the best DOPE score). The `gather` command will generally select only models
with good scores. The following criteria are used:

   * GA341: score>=0.7
   * z-DOPE: score<0
   * MPQS: score>1.0
   * SEQID: no threshold, model with highest sequence identity will be selected
   * LONGEST_GA341: longest models with score>=0.7
   * LONGEST_DOPE: longest model with z-DOPE<0
   * TSVMOD: predicted native overlap (3.5) >0.4
   * INPUT_TEMPLATE: model with highest sequence identity using input template, 
     used in template based calculations. 

It is often the case that no template spans the whole query sequence - for
example, in a two-domain system, one set of templates may yield models for the
first domain and another set may yield models for the second domain, while no
template covers both domains. Model selection may then pick a good model for
one domain, discarding the other domain models. In this case, therefore,
it may make sense to turn on :option:`--select_by_region`. This will cluster
the models by region and then apply the selection criteria to each region
individually rather than only to the whole sequence. Thus, at most one model
per selection method per region will be returned.

.. _benchmark_cmd:

modpipe benchmark
-----------------

This command is used to benchmark the performance of the ModPipe system by
comparing generated models for a sequence to a known structure (usually a
PDB crystal structure) of the same sequence.

The command, given a sequence identifier in the ModPipe filesystem, parses the
:ref:`models file <models_file_format>` and, for each model in that file,
compares it to the native structure and writes out a new YAML file that is
similar to the original models file but which contains an extra
'native_benchmark' field for each model, containing the benchmark data.

The sequence is identified using the :option:`--conf_file` and
:option:`--sequence_id` options. The PDB code and chain ID of the known
structure is specified using the :option:`--native` and :option:`--native_chain`
options (use the :option:`--pdb_repository` option to specify a list of
directories to search for PDB files). Finally, the :option:`--output_filename`
option is used to name the output file containing model and benchmark data.

The benchmark data in the output file looks similar to::

  native_benchmark:
    code: 1abc
    chain: A
    length: 116
    region: ['1', '116']
    cutoff_rms:
    - {cutoff: 1.0, num_equiv_pos: 115, rms: 0.037}
    - {cutoff: 2.0, num_equiv_pos: 115, rms: 0.037}
    - {cutoff: 3.0, num_equiv_pos: 116, rms: 0.229}
    - {cutoff: 4.0, num_equiv_pos: 116, rms: 0.229}
    - {cutoff: 5.0, num_equiv_pos: 116, rms: 0.229}
    mean_cutoff_rms: 0.152
    mean_num_equiv_pos: 115
    cutoff_rms_35: 0.229
    num_equiv_pos_35: 116
    global_rms: 0.229

The benchmarking data is created using Modeller's :meth:`Selection.superpose`
method; a simple 1:1 alignment (no gaps) is created between the model sequence
and the native structure, using C-alpha atoms to define each residue's spatial
position. The fields reported are:

code
    The PDB code of the native structure.

chain
    The PDB chain ID of the native structure.

length
    The number of residues in the model.

region
    The starting and ending PDB residue numbers in the native structure that
    correspond to the model. (The model may not cover the entire native chain.)

cutoff_rms
    A list of comparisons between the model and native structure. Each row
    is a superposition using a different cutoff to :meth:`Selection.superpose`.
    For each cutoff, the number of equivalent positions (number of model
    residues within the cutoff distance from the same residue in the superposed
    native structure) is reported. The root-mean-square deviation of model
    residue positions from the native positions is also reported, for all
    residues that are within the cutoff distance.

mean_cutoff_rms, mean_num_equiv_pos
    The average of the rms and num_equiv_pos values for all rows in cutoff_rms.

cutoff_rms_35, num_equiv_pos_35
    The equivalent of the rms and num_equiv_pos values from the cutoff_rms
    table, for a 3.5 angstrom cutoff.

global_rms
    The root-mean-square deviation of model residue positions from native
    positions, for all residues in the structure.

.. _convertseq_cmd:

modpipe convert
---------------

This tool will convert an input alignment file to a different format. Since
ModPipe uses exclusively FASTA format for the input sequence (see
:ref:`addseqmp_cmd`) this is a useful tool if your sequence is in another
format, such as PIR.