.. _running:

Running ModPipe
***************

.. _running_overview:

Overview
========

In summary, for a given sequence, ModPipe:

* Finds potential templates and creates alignments between the given sequence
  and each potential template;
* Runs Modeller for each template; and
* Accumulates results for all templates in a directory identified by
  a unique identifier for the given sequence.

There are two steps. An initialization step sets up the ModPipe filesystem
and the identifier for the given sequence, or for each of a set of given 
sequences. Then ModPipe can be run for each sequence.

**Initialization**
   :ref:`addseqmp_cmd` will take a file of protein sequences in FASTA
   format and create identifiers and subdirectories for each unique sequence.
   :ref:`addseqmp_cmd` will also create a
   :file:`{sequence_file_name}.unq` file that contains a mapping from the
   identifiers to the names of the sequences in the FASTA file (align codes).
   See :ref:`unqfile` for more information.
   If your sequence file is not in FASTA format, first convert it to FASTA
   format using :ref:`convertseq_cmd`.

**ModPipe**
   For each sequence_id in the :file:`{sequence_file_name}.unq` 
   file, run ModPipe by calling :ref:`modpipe_cmd`.  

Run requirements and options
----------------------------

While there are a number of command-line arguments to ModPipe,
ModPipe relies primarily on the :ref:`configuration file <conf>` 
to specify input and output file locations and run parameters.
The command line options are :ref:`also described <modpipe_cmd>`.

.. _examples:

Examples
========

There is an example file included in the ModPipe distribution in the
:file:`demo` subdirectory.

To run the demo, change into the :file:`demo` directory and then set the
:envvar:`DEMO` environment variable to the current directory, e.g. with
something like::

   export DEMO=`pwd`

Next, run the following command (this assumes that ModPipe's `bin` directory
is in your PATH) to populate the ModPipe filesystem::

   modpipe add --conf_file modpipe.conf --sequence_file test.fsa

This command will add the sequences to the ModPipe filesystem in the
:file:`data` directory and create a file with unique sequence identifiers,
:file:`test.unq`. (The file should contain two such identifiers.) Next, run::

   modpipe build --conf_file modpipe.conf \
                 --sequence_id 97e075794f588a59e8a0fb8a945814b1MLGIKIKP \
                 --score_by_tsvmod OFF \
                 --hits_mode "Seq-Seq,Prf-Seq" 

This command will run ModPipe on one of the two sequences in the filesystem
and produce 15 fold assignments and alignments
(7 from Sequence-Sequence search and 8 from Profile-Sequence search). The
fold assignments (hits) are then filtered to remove redundancy, resulting
in 8 selected hits. Finally, one model is built for each selected hit.

.. note:: For the sake of speed, the example does not calculate
          profile-profile alignments, and only builds a single model per
          alignment.

Data about generated models will be stored in :file:`*.mod`
files in each sequence directory. For instance, if you look in 
:file:`data/97e/97e075794f588a59e8a0fb8a945814b1MLGIKIKP/sequence/` there
will be a file called :file:`97e075794f588a59e8a0fb8a945814b1MLGIKIKP.mod`.
There will also be a :file:`.hit` file that contains all the
fold assignments. See :ref:`file_formats` for full information on the contents
of these files.