I like the sequence comparison functions in Modeller. I would like to calculate sequence conservation or variabilty as a function of position. One of the proteins I work with contains a conserved domain, and I would like to plot this. Unfortunately, variability as a function of alignment position is not calculated in a meaningful way. It does calculate what it says it calculates, but the result is not helpful at all - it is the RMS deviation of all residue - residue scores. I have plotted this, and it shows little relationship to highly conserved areas. It picks up positions of identity fine, but that is about it.
Could you please implement a calculation of sequence conservation. I suggest the total of the similarity scores. You could divide this by the maximal number of sequence comparisons. I would not divide by the number of comparisons at each position, because this may be small for positions with gaps, and might overestimate conservation in gapped regions.
Also, the documentation says gaps are included in the scoring matrix, but this is only true for the blosum matrix.
Thanks, Tim Springer
>From the manual: SEQUENCE_COMPARISON -- compare sequences in alignment Description: The family variability as a function of alignment position is calculated as the RMS deviation of all residue - residue scores at a given position, but only for those alignment positions that have at most MAX_GAPS_MATCH gaps. The variability is written to file VARIABILITY_FILE , as is the number of pairwise comparisons contributing to each positional variability.
******************************************************************************* * Timothy A. Springer, Ph.D. * * Latham Family Professor of Pathology * * Harvard Medical School e-mail: springer@sprsgi.med.harvard.edu * * Center for Blood Research phone: 617-278-3200 * * 200 Longwood Ave. Room 251 fax: 617-278-3232 * * Boston MA 02115 * *******************************************************************************