Running ModPipe

Overview

In summary, for a given sequence, ModPipe:

  • Finds potential templates and creates alignments between the given sequence and each potential template;

  • Runs Modeller for each template; and

  • Accumulates results for all templates in a directory identified by a unique identifier for the given sequence.

There are two steps. An initialization step sets up the ModPipe filesystem and the identifier for the given sequence, or for each of a set of given sequences. Then ModPipe can be run for each sequence.

Initialization

modpipe add will take a file of protein sequences in FASTA format and create identifiers and subdirectories for each unique sequence. modpipe add will also create a sequence_file_name.unq file that contains a mapping from the identifiers to the names of the sequences in the FASTA file (align codes). See Unique mapping file for more information. If your sequence file is not in FASTA format, first convert it to FASTA format using modpipe convert.

ModPipe

For each sequence_id in the sequence_file_name.unq file, run ModPipe by calling modpipe build.

Run requirements and options

While there are a number of command-line arguments to ModPipe, ModPipe relies primarily on the configuration file to specify input and output file locations and run parameters. The command line options are also described.

Examples

There is an example file included in the ModPipe distribution in the demo subdirectory.

To run the demo, change into the demo directory and then set the DEMO environment variable to the current directory, e.g. with something like:

export DEMO=`pwd`

Next, run the following command (this assumes that ModPipe’s bin directory is in your PATH) to populate the ModPipe filesystem:

modpipe add --conf_file modpipe.conf --sequence_file test.fsa

This command will add the sequences to the ModPipe filesystem in the data directory and create a file with unique sequence identifiers, test.unq. (The file should contain two such identifiers.) Next, run:

modpipe build --conf_file modpipe.conf \
              --sequence_id 97e075794f588a59e8a0fb8a945814b1MLGIKIKP \
              --score_by_tsvmod OFF \
              --hits_mode "Seq-Seq,Prf-Seq"

This command will run ModPipe on one of the two sequences in the filesystem and produce 15 fold assignments and alignments (7 from Sequence-Sequence search and 8 from Profile-Sequence search). The fold assignments (hits) are then filtered to remove redundancy, resulting in 8 selected hits. Finally, one model is built for each selected hit.

Note

For the sake of speed, the example does not calculate profile-profile alignments, and only builds a single model per alignment.

Data about generated models will be stored in *.mod files in each sequence directory. For instance, if you look in data/97e/97e075794f588a59e8a0fb8a945814b1MLGIKIKP/sequence/ there will be a file called 97e075794f588a59e8a0fb8a945814b1MLGIKIKP.mod. There will also be a .hit file that contains all the fold assignments. See File formats for full information on the contents of these files.