[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [modeller_usage] origin of blosum62.sim.mat and as1.sim.mat



Joshua A. Speidel wrote:
I am curious about the origin of the blosum62.sim.mat and as1.sim.mat.

When I look at the blosum62.sim.mat, I can't figure out the relationship
between the numbers there, and the original blosum62 matrix. Initially,
I thought that the Henikoff & Henikoff values were just scaled to
between 0 and 1000, but that didn't seem to work.

Close. The relationship is m=(9+h)*50 where h is the original H&H blosum62 value, and m that used in Modeller. The original blosum62 similarity measure is first converted from the -4 to 11 range to the 5 to 20 range (so that we can use 0 similarity for comparing a residue with a gap) and then scaled to between 0 and 1000.

I can't find a reference for ALBASE3 mentioned in as1.sim.mat.

I believe that was one of the databases of protein structures used in the derivation of the original Modeller restraints - see the '93 Modeller paper. As far as I understand it, it was derived in a very similar way to blosum62, just using the Modeller structure set rather than the BLOCKS set used for blosum.

Finally, if I wanted to use my own matrix, is it necessary for me to
scale it to between 0 and 1000?

Yes, and the first line of the file should read either #DISTANCE or #SIMILARITY to identify whether it is a distance or similarity matrix.

	Ben Webb, Modeller Caretaker
--
             http://www.salilab.org/modeller/
Modeller mail list: http://salilab.org/mailman/listinfo/modeller_usage