Databases used by ModPipe¶
ModPipe relies on several structure, sequence, and profile databases. The locations of these databases are given in the configuration file.
If you are not in the Sali lab, you will need to download and set up these databases yourself, as detailed below.
Template sequence database (PDB95)¶
This is a file containing the sequence for each representative structure from
PDB (clustered at 95% to remove redundancy). It can be manually generated from
PDB, or you can download the PIR file (pdb_95.pir
) from the
Modeller website.
ModPipe usually uses a binary (HDF5) database, which can be generated from
the PIR file using Modeller’s SequenceDB.convert()
method.
PDB repository¶
This should be a single directory containing uncompressed PDB files for every structure in the PDB95 database. These can be downloaded from the PDB website.
Note
While newer parts of ModPipe can handle compressed PDB files, and the new ‘divided’ directory structure (e.g. pdb1abc.ent.gz is found in an ‘ab’ subdirectory) some parts of ModPipe still expect all the PDB files to be uncompressed and in the same directory.
Sequence database¶
This is a database of sequences from UniProt, clustered at 90% to remove
redundancy. The raw FASTA file containing these sequences (uniprot90
)
can be downloaded from the
Modeller website.
For ModPipe, you will need this database in binary (HDF5) form; use
Modeller’s SequenceDB.convert()
method to produce this. If you want to
use PSI-BLAST searches as well, you will also need to generate the BLAST
databases and indexes from uniprot90 using its formatdb tool.
Structure profiles database¶
This database contains a profile for each PDB95 structure (constructed by
scanning through UniProt90). It consists of a .prf
profile file for
each PDB95 structure, together with a file listing all of the profiles
(pdb95_prf.list
) and a database of PSSMs for all of the profiles
(pdb95_prf.pssm
). It can be downloaded from the
Modeller website.