This file contains the modifications made to the alignments database:


August 3, 1993:

   This is the first official release of the alignments database. It was
   published in Sali, Overington, and Karplus, 1993.

   ALBASE 1: 97 alignment files with igvar split into igvar-v and 
             igvar-l in the ALBASE_1.list file.

   ALBASE 1a: the same as ALBASE 1, except that igvar are completely removed.


August 13, 1993 (AS):

   tren, 2hsc, 1pai, 1esp, 1sav, 1ypa PDB codes were changed to start
   with X to indicate that they were not yet deposited to PDB.

   mucor pusilus --> mucor pusillus; spelling only

   fkbp was changed from an NMR structure to X-ray structure


August 19, 1993 (AS):

   agreed with JPO on C; and R; comments replacing all other comments
   (changes to *.ali, joy, MODELLER).


Jan 7, 1993 (AS):

   Obtained the whole set of alignments from JPO. Created ALBASE_2.
   Major changes from ALBASE_1:

   Added: ace.ali, cyp.ali, cyto.ali, igcell2.ali, porin.ali, rnase2.ali,
          sh3.ali

   Deleted: Hla_ig.ali

   Renamed (modified): igcell.ali --> igcell1.ali
                       igvar.ali  --> igvar-h.ali and igvar-l.ali
                       sod2.ali  --> sodcu.ali

   Modified by JPO: most of the files.

   All alignments in one directory.

   Editing zf-CCHH.ali to make full entry specification records for 
   the Pabo zinc-finger sequences.

   Editing asp, igvar-h, sermam, serpin, az (and maybe one or two others) 
   to correct small chain specification mistakes.

   Editing sh3.ali to make full entry specification records for the 1pnj entry.

   Editing cyp.ali (N271 changed to E271).

   Non-existing 1neaN PDB filename changed to 1nea (I hope JPO did not
   rename/modidy the original PDB for a good purpose).
  
   Non-existing 1zaa1, 1zaa2, and 1zaa3 PDB filenames were changed to 1zaa.

   Changing the special (non-PDB) names back to X:
         mv 2hsc.atm Xhsc.atm   (actin.ali)
         mv tren.atm Xren.atm   (asp.ali)
         mv 1ypa.atm Xypa.atm   (asp.ali)
         mv 1pai.atm Xpai.atm   (serpin.ali)
         mv 1esp.atm Xesp.atm   (subt.ali)
         mv 1sav.atm Xsav.atm   (subt.ali)
 
   JPO:  I beg you to accept this convention that non-PDB files
         start with X, not a digit or something else. This allows 
         an easy identification of such structures by grep and in 
         printed Tables where we do have to acknowledge the individual 
         sources (like in our database paper). The 1pai situation
         is another reason: there is a model in PDB file 1pai, which
         is not your 1pai (my Xpai).

   The database was checked by producing TeX files for all alignments
   with joy and by running MDT on alignments and PDB producing the 
   minimal PDB subset and some other statistics.


Januar 20, 1994: ALBASE_2a.
  
   JPO sends back an upgrade of AS ALBASE_2. It is called ALBASE_2a.
   This is what he said about his changes to ALBASE_2:

Two new families (or ones you did not email back)

p450 - cytochrome p450s
hpr  - histidine carrier proteins

A small number of changes (arrows are the wrong way around for diffs, sorry)

1) fkbp
C; class: alpha plus beta > C; class: akoha plus beta

2) icd - line 15 (D - E) mutation
< HPELTDMVIFRENSEDIYAGIEWKADSADAEKVIKFLREEMGVKKIRFPDHCGIGIKPCSEEGTKRLVRAAIEYA
> HPELTDMVIFRENSEDIYAGIEWKADSADAEKVIKFLREEMGVKKIRFPEHCGIGIKPCSEEGTKRLVRAAIEYA

3) igcell1 
family: immunoglobulin -- cell surface - type 1 > family: immunoglobulin cell surface - type 1

4) igcell2
family: immunoglobulin -- cell surface - type 2 > family: immunoglobulin cell surface - type 2

5) kazal
family: serine proteinase inhibitor -- Kazal-type > family: serine proteinase inhibitor Kazal-type

6) rhv - chain breaks made explicit in alignment (/ character where appropriate)
   also Name change for one of proteins (it was too long for tables). Diffs are
   not shown, beacuae they are noiminally substantial. I also made some
   alignment changes along the lines of chain A was never equivalenced with 
   chain B of a differeing protein. I think this is a more consistent treatment
   of the data.

7) sermam - new structure (salmon trypsin) (1tbs)
   rat trypsin (N->D), as discussed in previous email, for my own selfish
   evolutionary reasons I like the D.
   chymotrypsin, too out S early on in sequence (only N in file for this)

< --------CGVPAIQPVL///////////////////IVNGEEAVPGSWPWQVSLQDKT---GFHFCGGSLINEN
                    $
> --------CGVPAIQPVLS//////////////////IVNGEEAVPGSWPWQVSLQDKT---GFHFCGGSLINEN

   tonin (I think) deleted I, similar reason to above. + Chain break here

< WVITAAHCY------SN----NYQVLLGRNNLFKDE-PFAQRRLVRQSFRHPDYIPL/PVHDHSNDLMLLHLSEP
                                                           $
> WVITAAHCY------SN----NYQVLLGRNNLFKDE-PFAQRRLVRQSFRHPDYIPLIPVHDHSNDLMLLHLSEP

8) sh3 - name change, too long for table
< structureN:1pnj:   1A: :  84 : :p85-alpha subunit SH3 domain:-1.00:-1.00
---
> structureN:1pnj:   1A: :  84 : :phosphatidylinositol 3-kinase (p85-alpha subun

9) sodcu - new structure (1srd) + consequent changes in alignment to accom
   this new sequence

10) subt - change 1sav structure (Unilver proprietary) to public domain 1st3.

11) zf-CCHH - new structure 1ard + couple of name changes


I hope this is enough detail for you

Also I have put in name changes, to the proper XABC code for private coord sets
(affects asp, serpin and subt families)

I then renamed the relevant files, ran the thing through joy again (with no
errors at all), regenerated the tables and enclose at the bottom of the file
the latest version of the table for the database. Now that the thing truly
is automated it is a lot easier to keep the thing up to date. Do you have
groff ?. Do you have Phylip ? (You need groff to print the table from troff
source, and you need phylip to get nice trees from the alignments) I have
partly changed joy so that the phylip tree order is the order found on the
LaTeX output (so that one can stick the two things together and get a nice
figure), but this is buggy at the moment for several reasons, I will email
you joy with all your/mine modifs soon, but I am sure you are in no hurry
for this.

P.S. As you will see there is a bug in the numbering of the families in 
the first table, this is not as easy to fix as it should be, but I will do
it soon. I will also put the whole thing under the control of make as
well, (i.e. from atm files to everything) I imagine that you could then
extend this to take into account all of the modeller db stuff as well.

------------------------------------------------------------------------------
  
AS made the following changes (Januar 25, 1994; remains ALBASE_2a) to
allow an automated cross-reference with the original PDB:
   
Do a diff on sh3.ali and zf-CCHH.ali; the changes are obvious.

John: There is an old problem with several ali files (e.g. 3icd in
icd.ali, and 1ton and 1trm in sermam.ali, etc.). I think we have to
have a strict rule of what must be a sequence of a protein with a
given name in the .ali file. I think the only reasonable rule is that
the sequence must be exactly the same as the sequence of residues that
have at least one atom in the PDB file with the same name. There is no
logical reason here to distinguish CA atoms from the others and
require that the CA atom must be present.  Also, I do not think that
the SEQRES records should be taken into account even though they may
be in better agreement with the sequence databases. If we agree, a
simple solution to your historical problems is that you slightly
change the root of the original PDB name, store the non-PDB sequence
in that file, and also include it in the alignment, together with the
original PDB sequence. The current .ali files conform to the rule
above. It is difficult to see how the database could be portable to
other machines, which have to rely on the original PDB, if we do not
accept this rule.

Also, I had to make a lot of changes for the second time. I could see
why you did not want to accept some of these changes, but not all of
them.  The only important thing for me is that we agree on the rules
and that we follow them. I do not mind if the rules are not entirely
mine.  I appologize for being a pain in the ass here but I hate
repetitive, dull, and unnecessary work. The thing is that you are
probably using manually edited PDB files in which you have manually
taken care of the problems that are encountered by the automated PDB
access because the rules are not strictly observed.  Also, I do not
know how joy treats partially present residues. Given all that, I
would still like to convince you that the future is in complete
automation. For example, with the new alignment commands in MODELLER,
the superposed sets of all PDB structures in the alignments can easily
be generated automatically (given your alignments), so there is no
need at all for any manually edited files -- we only need the original
PDB distribution which is just one ftp command away, not days of error
prone editing away. I'd better stop my propaganda now.

Non-existing 1zaa1, 1zaa2, and 1zaa3 PDB filenames were again changed
to 1zaa: The code in >P1;code can be 1zaa1, but the second line should
contain the root for the PDB filename (1zaa if the original PDB file
is to be used; otherwise, Xzaa1, or something like that should be
specified and a new PDB file created).

A similar change again from 1neaN to 1nea.

I could not find 2hsc and 1esp gift structures in PDB today. I changed
them back to Xhsc and Xesp. What is the situation here? Did you maybe
forget that the second line contains the PDB filename spec, and that
the first line (>P1;Xesp) can be anything, although it is best if it
is the same as the PDB root plus chain ID (this chain ID rule is not
strictly enforced but it is not essential because only complicates the
Table because we cannot say that the chain ID is the last uppercase
letter in the protein code, where applicable).

Chain id's for the first and last residue were added again to 1lya in
asp.ali.

Residue numbers of 1azu and 2plt were corrected again.

Sequence of 2hfl in igvar-h.ali edited again to reflect the PDB sequence.

Sequence of 2cyp in peroxidase.ali edited again to reflect the PDB sequence.

Chain id for 1tbs in sermam.ali corrected.

Sequence of 2gch edited again to reflect the PDB sequence.

Chain ids and residue numbers changed for 1ppb in sermam.ali.

1pai PDB code changed again to Xpai in serpin.ali.

1hle residue numbers in serpin.ali changed again.


Februar 7, 1994,

fkbp.ali: 1fkb structureN is changed to structureX


Februar 28:

Changed 2cyp.ali to reflect the changed sequence in PDB (272 goes to Asn).

March, 18:

Commented out all sequence entries (igcell1.ali and zf-CCHH.ali).

March 19:

Changed multiple chain break characters ///// to /---- to allow easier
MODELLER code (sermam.ali, rvp.ali, rsp.ali).


May 10:

Received a set of test files for Release 3 from JPO. He could 
include his changes here.

May 11:

Editing the JPO test set to make it run with the latest PDB release,
mdt and joy:

   1) multiple //// replaced by a single / in all alignments: ace_NEW.ali
      asp_NEW.ali cyto_NEW.ali flavbb_NEW.ali hemocyan_NEW.ali ins_NEW.ali
      ldh_NEW.ali prc_NEW.ali rhv_NEW.ali rnh_NEW.ali sermam_NEW.ali
      serpin_NEW.ali.

   2) PDB entries changed, so the corresponding .ali files had to be updated:
      1f3g (gpr.ali); 1hom, 1lfb (hom.ali); 1ppb, 1bbr (sermam.ali);
      2ins & 4ins (ins.ali); 2ctx, 1fas (toxin.ali); 1bbs (asp.ali);
      2tbv (bv.ali);

      Probably also true for other .ali files which you have not sent me.

   3) 1prc in prc.ali has residue FOR inserted in the .ali file so that
      it now corresponds to the PDB file (JPO: either split the alignment in
      separate subunits so that the first FOR can be omitted; or introduce 
      FOR residue type in joy, otherwise we cannot distribute prc.ali for 
      the PDB files as they are).

   4) ins.ali: the order of some segments is not the same as in the PDB
      file: this cannot be the case; omitted 2gf1 which necessitates this
      (in effect, using the ALBASE2 version of ins.ali).

May 30:

JPO sent me his final complete set, which I edited again to make it
consistent with April 94 PDB release and my programs (ins.ali changed
to ins-jpo.ali and old ins.ali with PDB order of segments used instead). 
There were many sequence and chain ID changes also. This is my release 3.
No HETATMs included in the alignments because SSTRUC does not work with
HETATM; filtered out by mkfile() in MDT. Tested with PSA, DIH, PDB, NGH, 
and SSTRUC.
