################################################################# PLEASE READ THE WHOLE README FILE BEFORE USING THIS SET OF MODELS ################################################################# NOTE: this model set was generated by Roberto Sánchez at The Rockefeller University, New York, USA, in 1997. Some parsing, rearrangements, sorting, etc. has been carried out by Francisco Melo since then. GOOD and BAD comparative protein structure models. In this context GOOD models attempt to correspond to models based on correct templates (correct fold) and approximately correct alignments. BAD models would correspond to models based on incorrect templates (wrong fold) or very bad alignments. More quantitatively, GOOD models must have at least 30% of their CA atoms within 3.5 Angstrom of their corresponding CAs in the experimental structure. BAD models should have no more than 15% equivalent CA atoms. -- Three file types can be found: - The *.list files contain the names of the models. - The *.dat files contain important model information. - The *.rms files contain structural information (comparison between target and model). To simplify the calculations and minimize possible errors when working with these sets, the 'good.list' and 'good.dat' files are sorted by model name (i.e. each line in one file matches the same line in the other file). The same occurs for the 'bad.list' and 'bad.dat' files. The .rms files however are sorted by model number, thus they do not match line positions with these two other files. -- The model names can be mapped to the data in the *.dat files in the following way: 1bw3_2.B99990001 is the second model for 1bw3 in bad.dat the corresponding entry can be found using columns 3 and 2: 170 2 1bw3 125 41 106 66 2cba 120 180 28 18.9 0.7960940 -0.377119380303828 0.11 The first two columns in bad.dat or good.dat correspond to the first two columns in the *.rms files. -- In good.dat and bad.dat the meaning of the columns is the following: column 1 : target number column 2 : model number column 3 : target PDB code (corresponds to one PDB chain) column 4 : target length column 5 : starting residue of modeled segment column 6 : ending residue of modeled segment column 7 : size of model column 8 : PDB template column 9 : starting residue of template region used to build the model column 10 : ending residue of template region used to build the model column 11 : target - template sequence identity column 12 : target - template alignment significance (in nats) column 13 : ignore this column column 14 : normalized ProsaII Z-Score column 15 : pG ( see pG server at http://guitar.rockefeller.edu/pg/ ) -- The *.rms files contain data on the comparison of each model with its corresponding experimental structure. The meaning of the columns is the following: column 1 : target number column 2 : model number column 3 : RMSD cutoff (always 3.5 Angstrom) column 4 : RMSD (CA only) column 5 : % of equivalent CA atoms column 6 : RMSD (All heavy atoms) column 7 : % of equivalent heavy atoms column 8 : distance RMS (CA only) column 9 : % of equivalent CA - CA distances column 10 : distance RMS (All heavy atoms) column 11 : % of equivalent distances -- The two *.rms files contain data for more models than the set of good and bad models (this is because many of the models were eliminated based on this data). -- WARNING: It is possible that a good and a bad model share the same name. Thus, good and bad models must be written into different directories. --