Sequence alignment with missing residues
Dear Modeller experts,
My apologies for the multiple posts. I’m using Modeller to align sequences, find missing residues in pdb and then perform loop modeling for the missing regions. And I have a small issue with sequence alignment when there’re missing residues. Here’s my .ali file:
>P1;6cqx_a_unmod structureX:6cqx_a_unmod:3:A:+532:A:::-1.00:-1.00 -REDAELLVTVRGGRLRGIRLKTPGGPVSAFLGIPFAEPPMGPRRFLPPEPKQPWSGVVDATTFQSVCYQYVDTL YPGFEGTEMWNPNRELSEDCLYLNVWTPYPRPTSPTPVLVWIYGGGFYSGASSLDVYDGRFLVQAERTVLVSMNY RVGAFGFLALPGSREAPGNVGLLDQRLALQWVQENVAAFGGDPTSVTLFGE.AGAASVGMHLLSPPSRGLFHRAV LQSGAPNGPWATVGMGEARRRATQLAHLVGCPPGG---NDTELVACLRTRPAQVLVNHEWHVLPQESVFRFSFVP VVDGDFLSDTPEALINAGDFHGLQVLVGVVKDEGSYFLVYGAPGFSKDNESLISRAEFLAGVRVGVPQVSDLAAE AVVLHYTDWLHPEDPARLREALSDVVGDHNVVCPVAQLAGRLAAQGARVYAYVFEHRASTLSWPLWMGVPHGYEI EFIFGIPLDPSRNYTAEEKIFAQRLMRYWANFARTGDPNE-----APQWPPYTAGAQQYVSLDLRPLEVRRGLRA QACAFWNRFLPKLLSA-*
>P1;6cqx_A sequence::: :: :::-1.00:-1.00 GREDAELLVTVRGGRLRGIRLKTPGGPVSAFLGIPFAEPPMGPRRFLPPEPKQPWSGVVDATTFQSVCYQYVDTL YPGFEGTEMWNPNRELSEDCLYLNVWTPYPRPTSPTPVLVWIYGGGFYSGASSLDVYDGRFLVQAERTVLVSMNY RVGAFGFLALPGSREAPGNVGLLDQRLALQWVQENVAAFGGDPTSVTLFGESAGAASVGMHLLSPPSRGLFHRAV LQSGAPNGPWATVGMGEARRRATQLAHLVGCPPGGTGGNDTELVACLRTRPAQVLVNHEWHVLPQESVFRFSFVP VVDGDFLSDTPEALINAGDFHGLQVLVGVVKDEGSYFLVYGAPGFSKDNESLISRAEFLAGVRVGVPQVSDLAAE AVVLHYTDWLHPEDPARLREALSDVVGDHNVVCPVAQLAGRLAAQGARVYAYVFEHRASTLSWPLWMGVPHGYEI EFIFGIPLDPSRNYTAEEKIFAQRLMRYWANFARTGDPNEPRDPKAPQWPPYTAGAQQYVSLDLRPLEVRRGLRA QACAFWNRFLPKLLSAT*
And here’re the alignment commands I’m using:
e = Environ() aln = Alignment(e) aln.append(file='two_seq.seq', align_codes=('all')) aln.align(gap_penalties_1d=(-600, -400)) aln.write(file='two_seq.ali') quit()
The first sequence is read from a pdb file, with some residues (“-”). And the second sequence is the full sequence. However, because of the peculiar pattern, the alignment is not correct in this region:
GCPPGG---NDTEL (from the pdb file) PPGGTGGNDTEL (the full sequence)
I checked the structure file and then I found that the alignment should be as follows:
GCPP---GGNDTEL (from the pdb file) PPGGTGGNDTEL (the full sequence)
Is there any way to avoid this type of “mismatch” in the sequence alignment? Thank you very much for your kind advice in advance.
Massive Thanks, Amy
-- Amy He Chemistry Graduate Teaching Assistant Hadad Research Group Ohio State University he.1768@osu.edu
On 7/21/21 2:53 PM, He, Amy wrote: > My apologies for the multiple posts. I’m using Modeller to align > sequences, find missing residues in pdb and then perform loop modeling > for the missing regions. And I have a small issue with sequence > alignment when there’re missing residues. ... > The first sequence is read from a pdb file, with some residues (“-”). > And the second sequence is the full sequence. However, because of the > peculiar pattern, the alignment is not correct in this region: > > GCPPGG---NDTEL (from the pdb file) > > PPGGTGGNDTEL (the full sequence) > > I checked the structure file and then I found that the alignment should > be as follows: > > GCPP---GGNDTEL (from the pdb file) > > PPGGTGGNDTEL (the full sequence) > > Is there any way to avoid this type of “mismatch” in the sequence > alignment? Thank you very much for your kind advice in advance.
If you know what the alignment is you want to get, you can just create that by hand (alignment files are just text files, so you can use any text editor). You could even script it if you trust the numbering or SEQRES records in the PDB file (this would be an exercise for the reader, but it would be easy enough to compare ATOM/HETATM with SEQRES records to get the two sequences). You don't have to use Modeller's alignment methods to generate an alignment file.
Ben Webb, Modeller Caretaker
participants (2)
-
He, Amy
-
Modeller Caretaker