Running ModPipe¶
Overview¶
In summary, for a given sequence, ModPipe:
Finds potential templates and creates alignments between the given sequence and each potential template;
Runs Modeller for each template; and
Accumulates results for all templates in a directory identified by a unique identifier for the given sequence.
There are two steps. An initialization step sets up the ModPipe filesystem and the identifier for the given sequence, or for each of a set of given sequences. Then ModPipe can be run for each sequence.
- Initialization
modpipe add will take a file of protein sequences in FASTA format and create identifiers and subdirectories for each unique sequence. modpipe add will also create a
sequence_file_name.unq
file that contains a mapping from the identifiers to the names of the sequences in the FASTA file (align codes). See Unique mapping file for more information. If your sequence file is not in FASTA format, first convert it to FASTA format using modpipe convert.- ModPipe
For each sequence_id in the
sequence_file_name.unq
file, run ModPipe by calling modpipe build.
Run requirements and options¶
While there are a number of command-line arguments to ModPipe, ModPipe relies primarily on the configuration file to specify input and output file locations and run parameters. The command line options are also described.
Examples¶
There is an example file included in the ModPipe distribution in the
demo
subdirectory.
To run the demo, change into the demo
directory and then set the
DEMO
environment variable to the current directory, e.g. with
something like:
export DEMO=`pwd`
Next, run the following command (this assumes that ModPipe’s bin directory is in your PATH) to populate the ModPipe filesystem:
modpipe add --conf_file modpipe.conf --sequence_file test.fsa
This command will add the sequences to the ModPipe filesystem in the
data
directory and create a file with unique sequence identifiers,
test.unq
. (The file should contain two such identifiers.) Next, run:
modpipe build --conf_file modpipe.conf \
--sequence_id 97e075794f588a59e8a0fb8a945814b1MLGIKIKP \
--score_by_tsvmod OFF \
--hits_mode "Seq-Seq,Prf-Seq"
This command will run ModPipe on one of the two sequences in the filesystem and produce 15 fold assignments and alignments (7 from Sequence-Sequence search and 8 from Profile-Sequence search). The fold assignments (hits) are then filtered to remove redundancy, resulting in 8 selected hits. Finally, one model is built for each selected hit.
Note
For the sake of speed, the example does not calculate profile-profile alignments, and only builds a single model per alignment.
Data about generated models will be stored in *.mod
files in each sequence directory. For instance, if you look in
data/97e/97e075794f588a59e8a0fb8a945814b1MLGIKIKP/sequence/
there
will be a file called 97e075794f588a59e8a0fb8a945814b1MLGIKIKP.mod
.
There will also be a .hit
file that contains all the
fold assignments. See File formats for full information on the contents
of these files.