Joshua A. Speidel wrote: > I am curious about the origin of the blosum62.sim.mat and as1.sim.mat. > > When I look at the blosum62.sim.mat, I can't figure out the relationship > between the numbers there, and the original blosum62 matrix. Initially, > I thought that the Henikoff & Henikoff values were just scaled to > between 0 and 1000, but that didn't seem to work.
Close. The relationship is m=(9+h)*50 where h is the original H&H blosum62 value, and m that used in Modeller. The original blosum62 similarity measure is first converted from the -4 to 11 range to the 5 to 20 range (so that we can use 0 similarity for comparing a residue with a gap) and then scaled to between 0 and 1000.
> I can't find a reference for ALBASE3 mentioned in as1.sim.mat.
I believe that was one of the databases of protein structures used in the derivation of the original Modeller restraints - see the '93 Modeller paper. As far as I understand it, it was derived in a very similar way to blosum62, just using the Modeller structure set rather than the BLOCKS set used for blosum.
> Finally, if I wanted to use my own matrix, is it necessary for me to > scale it to between 0 and 1000?
Yes, and the first line of the file should read either #DISTANCE or #SIMILARITY to identify whether it is a distance or similarity matrix.
Ben Webb, Modeller Caretaker