A database of multiple protein structure alignments, ALBASE V3

Andrej Sali and John P. Overington


This is a list of multiple alignment files described in 

@ARTICLE{SalOve94,
        author={A. \v{S}ali and J.P Overington},
        title={Derivation of rules for comparative protein modeling
                from a database of protein structure alignments},
        journal={ Protein Sci.},
        volume={3},
        annote={A database of protein structure alignments and tools for its
                use are described. The database is applied to derive spatial
                restraints on disulphide bridges and cis/trans isomerism
                of Pro residues that are used for comparative modelling
                by MODELLER.},
        pages={1582-1596},
        year={1994}
}


Description of the alignment format from the MODELLER manual:

C; family: DNA-binding repressor
>P1;2cro
structureX:2cro:  -1 : :  63 : :cro repressor:phage 434: 2.30:19.50
-----MQTLSERLKKRRIALK----MTQTELATKAGVKQQSIQLIEAGVTKR-PRFLFE
IAMALNCD-----PVWLQYGT------*
>P1;1r69
structureX:1r69:   1 : :  63 : :repressor:phage 434: 2.00:19.30
-------SISSRVKSKRIQLG----LNQAELAQKVGTTQQSIEQLENGKTKR-PRFLPEL
ASALGVS-----VDWLLNGT-------*
>P1;1lrd3
structureX:1lrd:   6 :3:  92 :3:l repressor:bacteriophage l: 2.50:24.20
PLTQEQLEDARRLKAIYEKKKNELGLSQESVADKMGMGQSGVGALFNGINALNAYNAALL
AKILKVSVEEFSPSIAREIYEMYEAVS*

An alignment of three DNA-binding repressors is stored in this
file. The format for each entry is similar to that of the PIR sequence
database. The second line of the entry contains all the information
necessary to extract the atomic coordinates of the segment from the
original PDB coordinate set. The fields in this line are separated by
the columns and indicate the type of the method used to obtain the
structure (structureX, X-ray; structureN, NMR; structureM, model;
sequence, sequence), the PDB code, the residue numbers and chain
identifiers for the first and last residue in the segment, protein
name, source of the protein, resolution, and R-factor of the
crystallographic analysis.  The first six fields are necessary (at
least five `:' have to be present). When the alignment file is used in
conjunction with structural information, the first six fields must be
filled in, the rest of them can be empty. If the alignment is not used
in conjunction with structural data, all but the first field can be
empty.  Free format can be used for residue numbers and chain
identifiers. The first five non-blank characters in the field are used
for a residue number; the first non-blank character is used for a
chain id.  '@@@@@' and '@' can be used to specify the first residue in
the file; if a residue specified by the second residue number and
chain identifier is not found in the coordinate file, the last residue
is used. Thus, '@@@@@:@:XXXXX:X' specifies the whole coordinate file
(assuming XXXXX:X residue does not exist). There is no need to edit
the coordinate file if a contiguous sequence of residues is needed ---
simply specify the beginning and ending residue of the required
part. Each sequences must be terminated by the terminating character,
'*'.  When the first character of the sequence is the terminating
character, `*', the sequence is obtained from a PDB coordinate file
whose name is constructed from the PDB code using the file naming
convention described in Section~\ref{SECTIONFILENAME}.  Chain breaks
are indicated by `/'. There should not be more than one chain break
character to indicate a single chain break (use gaps instead).  All
residue types specified in $RESTYP_LIB are allowed (there are close to
100 types).


-------------------------------------------------------------------------------
Andrej Sali
 Box 270 
 The Rockefeller University 
 1230 York Avenue 
 New York, NY 10021-6399 
 voice (212) 327 7550 
 fax (212) 327 7540 
 e-mail sali@rockvax.rockefeller.edu
-------------------------------------------------------------------------------
