Databases used by ModPipe

ModPipe relies on several structure, sequence, and profile databases. The locations of these databases are given in the configuration file.

If you are not in the Sali lab, you will need to download and set up these databases yourself, as detailed below.

Template sequence database (PDB95)

This is a file containing the sequence for each representative structure from PDB (clustered at 95% to remove redundancy). It can be manually generated from PDB, or you can download the PIR file (pdb_95.pir) from the Modeller website. ModPipe usually uses a binary (HDF5) database, which can be generated from the PIR file using Modeller’s SequenceDB.convert() method.

PDB repository

This should be a single directory containing uncompressed PDB files for every structure in the PDB95 database. These can be downloaded from the PDB website.

Note

While newer parts of ModPipe can handle compressed PDB files, and the new ‘divided’ directory structure (e.g. pdb1abc.ent.gz is found in an ‘ab’ subdirectory) some parts of ModPipe still expect all the PDB files to be uncompressed and in the same directory.

Sequence database

This is a database of sequences from UniProt, clustered at 90% to remove redundancy. The raw FASTA file containing these sequences (uniprot90) can be downloaded from the Modeller website. For ModPipe, you will need this database in binary (HDF5) form; use Modeller’s SequenceDB.convert() method to produce this. If you want to use PSI-BLAST searches as well, you will also need to generate the BLAST databases and indexes from uniprot90 using its formatdb tool.

Structure profiles database

This database contains a profile for each PDB95 structure (constructed by scanning through UniProt90). It consists of a .prf profile file for each PDB95 structure, together with a file listing all of the profiles (pdb95_prf.list) and a database of PSSMs for all of the profiles (pdb95_prf.pssm). It can be downloaded from the Modeller website.