A good way of aligning the sequence of TvLDH with the structure of 4mdhA is the ALIGN2D command in MODELLER. Although ALIGN2D is based on a dynamic programming algorithm [75], it is different from standard sequence-sequence alignment methods because it takes into account structural information from the template when constructing an alignment. This task is achieved through a variable gap penalty function that tends to place gaps in solvent exposed and curved regions, outside secondary structure segments, and between two positions that are close in space. As a result, the alignment errors are reduced by approximately one third relative to those that occur with standard sequence alignment techniques. This improvement becomes more important as the similarity between the sequences decreases and the number of gaps increases. In the current example, the template-target similarity is so high that almost any alignment method with reasonable parameters will result in the same alignment. The following MODELLER script aligns the TvLDH sequence in file `TvLDH.seq' with the 4mdhA structure in the PDB file `4mdh.pdb' (file `align2d.top').
READ_MODEL FILE = '4mdh.pdb' SEQUENCE_TO_ALI ALIGN_CODES = '4mdhA' READ_ALIGNMENT FILE = 'TvLDH.ali', ALIGN_CODES = 'TvLDH', ADD_SEQUENCE = ON ALIGN2D WRITE_ALIGNMENT FILE='TvLDH-4mdh.ali', ALIGNMENT_FORMAT = 'PIR' WRITE_ALIGNMENT FILE='TvLDH-4mdh.pap', ALIGNMENT_FORMAT = 'PAP'
In the first line, MODELLER reads the 4mdhA structure file. The SEQUENCE_TO_ALI command transfers the sequence to the alignment array and assigns it the name of `4mdhA' (ALIGN_CODES ). The third line reads the TvLDH sequence from file `TvLDH.seq', assigns it the name `TvLDH' (ALIGN_CODES ) and adds it to the alignment array (`ADD_SEQUENCE = ON'). The fourth line executes the ALIGN2D command to perform the alignment. Finally, the alignment is written out in two formats, PIR (`TvLDH-4mdh.ali') and PAP (`TvLDH-4mdh.pap'). The PIR format is used by MODELLER in the subsequent model building stage. The PAP alignment format is easier to inspect visually. Due to the high target-template similarity, there are only a few gaps in the alignment. In the PAP format, all identical positions are marked with a `*' (file `TvLDH-4mdh.pap').
_aln.pos 10 20 30 40 50 60 4mdhA GSEPIRVLVTGAAGQIAYSLLYSIGNGSVFGKDQPIILVLLDITPMMGVLDGVLMELQDCALPLLKDV TvLDH MSEAAHVLITGAAGQIGYILSHWIASGELYG-DRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGF _consrvd ** ** ******* * * * * * * * **** * * * *** *** * * _aln.p 70 80 90 100 110 120 130 4mdhA IATDKEEIAFKDLDVAILVGSMPRRDGMERKDLLKANVKIFKCQGAALDKYAKKSVKVIVVGNPANTN TvLDH VATTDPKAAFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTN _consrvd ** **** * * ** *** * * ** * *** * * * ** **** * *** *** _aln.pos 140 150 160 170 180 190 200 4mdhA CLTASKSAPSIPKENFSCLTRLDHNRAKAQIALKLGVTSDDVKNVIIWGNHSSTQYPDVNHAKVKLQA TvLDH CEIAMLHAKNLKPENFSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEG _consrvd * * * **** * ** *** * **** ** * **** * * _aln.pos 210 220 230 240 250 260 270 4mdhA KEVGVYEAVKDDSWLKGEFITTVQQRGAAVIKARKLSSAMSAAKAICDHVRDIWFGTPEGEFVSMGII TvLDH KTQKVVDVLDHDYVFDTFFKKIGHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPV _consrvd * * * * * * ** * _aln.pos 280 290 300 310 320 330 4mdhA SDGNSYGVPDDLLYSFPVTIK-DKTWKIVEGLPINDFSREKMDLTAKELAEEKETAFEFLSSA- TvLDH PEGNPYGIKPGVVFSFPCNVDKEGKIHVVEGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG _consrvd ** ** *** *** ** *** * * * * *** * *