ModPipe command line tools¶
ModPipe is usually run from the command line by running the modpipe command line tool, found in the bin directory. For example, a typical run consists of first running the modpipe add command to populate the ModPipe filesystem with input FASTA sequences, then running modpipe build for each sequence to find hits and build models. Finally, the modpipe gather command can be used to select the best model(s) from the run. Other commands are provided to accomplish other tasks, such as manipulating alignments or benchmarking the ModPipe system.
Common options¶
To get help on a given command, use modpipe help <command>. For example, modpipe help add shows the options supported by the ‘add’ command.
modpipe add¶
This command will, given a FASTA file containing one or more sequences, populate the ModPipe filesystem ready for running modpipe build. It will also create a mapping file that maps FASTA identifiers to ModPipe sequence IDs (see Unique mapping file).
modpipe build¶
Required arguments
--conf_file
The name of the configuration file (typically,modpipe.conf
) must be provided.
--sequence_id
The identifier of the sequence to be modelled must be provided, as created by modpipe add.
Optional arguments
--hits_mode
Option that specifies the search methods used to find templates (“hits”) matching the input (“target”) sequence. See Fold assignment methods for full details on each method. The following methods can be requested:
Seq-Seq: Sequence-Sequence search.
- Prf-Seq: Profile-Sequence search using MODELLER
Profile.build()
profiles.PSI-Blast-Prf-Seq: Profile-Sequence search using PSI-BLAST profiles.
- Prf-Prf: Profile-Profile search using MODELLER
Profile.build()
profiles.PSI-Blast-Prf-Prf: Profile-Profile search using PSI-BLAST profiles.
Seq-Prf: Sequence-Profile search.
Max-PSSM-Seq-Prf: Sequence-Profile search with Max-PSSM scoring.
Max-Freq-Seq-Prf: Sequence-Profile search with Max-frequency scoring.
Multiple methods can be requested by several hits_mode statements, or a comma-separated list. “Seq-Seq” will be always added, regardeless of user-input.
The default for
--hits_mode
is “Seq-Seq,Seq-Prf”.Each search method specified will be used independently. See also the CLUSTERALI variable in the configuration file if you use multiple methods.
modpipe gather¶
This command is designed to be run after the main modpipe build command.
It parses all of the generated models in the
models file and generates a final models file
that contains the ‘best’ models (in the same YAML format as the input models
file). This can be done either for a single
sequence (--seq_id
option) or for all sequences in the
unique mapping (unq) file (--unq_file
option).
One or more selection methods (e.g. pick the model generated from the highest
identity template, or that has the best DOPE score) can be specified with the
--final_models_by
option. The final models file will contain a single
model per method, or potentially fewer if multiple methods select the same
model (e.g. the model with the highest identity template also happens to have
the best DOPE score). The gather command will generally select only models
with good scores. The following criteria are used:
GA341: score>=0.7
z-DOPE: score<0
MPQS: score>1.0
SEQID: no threshold, model with highest sequence identity will be selected
LONGEST_GA341: longest models with score>=0.7
LONGEST_DOPE: longest model with z-DOPE<0
TSVMOD: predicted native overlap (3.5) >0.4
INPUT_TEMPLATE: model with highest sequence identity using input template, used in template based calculations.
It is often the case that no template spans the whole query sequence - for
example, in a two-domain system, one set of templates may yield models for the
first domain and another set may yield models for the second domain, while no
template covers both domains. Model selection may then pick a good model for
one domain, discarding the other domain models. In this case, therefore,
it may make sense to turn on --select_by_region
. This will cluster
the models by region and then apply the selection criteria to each region
individually rather than only to the whole sequence. Thus, at most one model
per selection method per region will be returned.
modpipe benchmark¶
This command is used to benchmark the performance of the ModPipe system by comparing generated models for a sequence to a known structure (usually a PDB crystal structure) of the same sequence.
The command, given a sequence identifier in the ModPipe filesystem, parses the models file and, for each model in that file, compares it to the native structure and writes out a new YAML file that is similar to the original models file but which contains an extra ‘native_benchmark’ field for each model, containing the benchmark data.
The sequence is identified using the --conf_file
and
--sequence_id
options. The PDB code and chain ID of the known
structure is specified using the --native
and --native_chain
options (use the --pdb_repository
option to specify a list of
directories to search for PDB files). Finally, the --output_filename
option is used to name the output file containing model and benchmark data.
The benchmark data in the output file looks similar to:
native_benchmark:
code: 1abc
chain: A
length: 116
region: ['1', '116']
cutoff_rms:
- {cutoff: 1.0, num_equiv_pos: 115, rms: 0.037}
- {cutoff: 2.0, num_equiv_pos: 115, rms: 0.037}
- {cutoff: 3.0, num_equiv_pos: 116, rms: 0.229}
- {cutoff: 4.0, num_equiv_pos: 116, rms: 0.229}
- {cutoff: 5.0, num_equiv_pos: 116, rms: 0.229}
mean_cutoff_rms: 0.152
mean_num_equiv_pos: 115
cutoff_rms_35: 0.229
num_equiv_pos_35: 116
global_rms: 0.229
The benchmarking data is created using Modeller’s Selection.superpose()
method; a simple 1:1 alignment (no gaps) is created between the model sequence
and the native structure, using C-alpha atoms to define each residue’s spatial
position. The fields reported are:
- code
The PDB code of the native structure.
- chain
The PDB chain ID of the native structure.
- length
The number of residues in the model.
- region
The starting and ending PDB residue numbers in the native structure that correspond to the model. (The model may not cover the entire native chain.)
- cutoff_rms
A list of comparisons between the model and native structure. Each row is a superposition using a different cutoff to
Selection.superpose()
. For each cutoff, the number of equivalent positions (number of model residues within the cutoff distance from the same residue in the superposed native structure) is reported. The root-mean-square deviation of model residue positions from the native positions is also reported, for all residues that are within the cutoff distance.- mean_cutoff_rms, mean_num_equiv_pos
The average of the rms and num_equiv_pos values for all rows in cutoff_rms.
- cutoff_rms_35, num_equiv_pos_35
The equivalent of the rms and num_equiv_pos values from the cutoff_rms table, for a 3.5 angstrom cutoff.
- global_rms
The root-mean-square deviation of model residue positions from native positions, for all residues in the structure.
modpipe convert¶
This tool will convert an input alignment file to a different format. Since ModPipe uses exclusively FASTA format for the input sequence (see modpipe add) this is a useful tool if your sequence is in another format, such as PIR.