> HI, > I performed multiple sequence alignment using salign. When I analysed the result output (.pap output), there was some improper alignment at 400th position. Because of that, entire alignment after that error was improper. How can I improve the alignment and what is the reason for such mistakes. > > I have attached the file containing sequence of the protein, salign code and the pap output file.
>sp|3EI4| MSYNYVVTAQKPTAVNGCVTGHFTSAEDLNLLIAKNTRLEIYVVTAEGLRPVKEVGMYGKIAVMELFRPKGESKD LLFILTAKYNACILEYKQSGESIDIITRAHGNVQDRIGRPSETGIIGIIDPECRMIGLRLYDGLFKVIPLDRDNK ELKAFNIRLEELHVIDVKFLYGCQAPTICFVYQDPQGRHVKTYEVSLREKEFNKGPWKQENVEAEASMVIAVPEP FGGAIIIGQESITYHNGDKYLAIAPPIIKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKEEQMDGTVTLKDL RVELLGETSIAECLTYLDNGVVFVGSRLGDSQLVKLNVDSNEQGSYVVAMETFTNLGPIVDMCVVDLERQGQGQL VTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPNRETYDTLVLSFVGQTRVLMLNGEEVEETELMG FVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVASCNSSQVVVAVGRALYYLQIHPQE LRQISHTEMEHEVACLDITPLGDSNGLSPLCAIGLWTDISARILKLPSFELLHKEMLGGEIIPRSILMTTFESSH YLLCALGDGALFYFGLNIETGLLSDRKKVTLGTQPTVLRTFRSLSTTNVFACSDRPTVIYSSNHKLVFSNVNLKE VNYMCPLNSDGYPDSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGVLSSRIEVQDTSGGTT ALRPSASTQALSSSVSSSKLFSSSTAPHETSFGEEVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDP NTYFIVGTAMVYPEEAEPKQGRIVVFQYSDGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKDVR TECNHYNNIMALYLKTKGDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLF VCQKDSAATTDEERQHLQEVGLFHLGEFVNVFCHGSLVMQNLGETSTPTQGSVLFGTVNGMIGLVTSLSESWYNL LLDMQNRLNKVIKSVGKIEHSFWRSFHTERKTEPATGFIDGDLIESFLDISRPKMQEVVANLQYDDGSGMKREAT ADDLIKVVEELTRIH >sp|2B5L| MSYNYVVTAQKPTAVNGCVTGHFTSAEDLNLLIAKNTRLEIYVVTAEGLRPVKEVGMYGKIAVMELFRPKGESKD LLFILTAKYNACILEYKQSGESIDIITRAHGNVQDRIGRPSETGIIGIIDPECRMIGLRLYDGLFKVIPLDRDNK ELKAFNIRLEELHVIDVKFLYGCQAPTICFVYQDPQGRHVKTYEVSLREKEFNKGPWKQENVEAEASMVIAVPEP FGGAIIIGQESITYHNGDKYLAIAPPIIKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKEEQMDGTVTLKDL RVELLGETSIAECLTYLDNGVVFVGSRLGDSQLVKLNVDSNEQGSYVVAMETFTNLGPIVDMCVVDLERQGQGQL VTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPNRETYDTLVLSFVGQTRVLMLNGEEVEETELMG FVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVASCNSSQVVVAVGRALYYLQIHPQE LRQISHTEMEHEVACLDITPLGDSNGLSPLCAIGLWTDISARILKLPSFELLHKEMLGGEIIPRSILMTTFESSH YLLCALGDGALFYFGLNIETGLLSDRKKVTLGTQPTVLRTFRSLSTTNVFACSDRPTVIYSSNHKLVFSNVNLKE VNYMCPLNSDGYPDSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGVLSSRIEVQDTSGGTT ALRPSASTQALSSSVSSSKLFSSTSFGEEVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDPNTYFIV GTAMVYPEEAEPKQGRIVVFQYSDGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKDVRTECNHY NNIMALYLKTKGDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLFVCQKDS AATTDEERQHLQEVGLFHLGEFVNVFCHGSLVMQNLGSTPTQGSVLFGTVNGMIGLVTSLSESWYNLLLDMQNRL NKVIKSVGKIEHSFWRSFHTERKTEPATGFIDGDLIESFLDISRPKMQEVVANLQYDDGSGMKREATADDLIKVV EELTRIH >sp|3E0C| SYNYVVTAQKPTAVNGCVTGHFTEDLNLLIAKNTRLEIYVVTLRPVKEVGMYGKIAVMELFRPKGKDLLFILTAK YNACILEYKSIDIITRAHGNVQDRGIIGIIDPECRMIGLRLYDGLFKVIPLDNKELKAFNIRLEELHVIDVKFLY GCQAPTICFVYQDPRHVKTYEVSLREKEFNKGWKQNVEAEASMVIAVPEPFGGAIIIGQESITYHNGYLAIAPPI IKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKDGTVTLKDLRVELLGETSIAECLTYLDGVVFVGSRLGDSQ LVKLNVYVVAMETFTNLGPIVDMCVVDQGQLVTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPET DDTLVLSFVGQTRVLMLETELMGFVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVAS CNSSQVVVAVGRALYYLQIHPQELRQISHTEMEHEVACLDITPLGLSPLCAIGLWTDISARILKLPSFELLHKEM LGGEIIPRSILMTTFESSHYLLCALGDGALFYFGLNIETGLLSDKKVTLGTQPTVLRTFRSSTTNVFACSDRPTV IYSNHKLVFSNVNLKEVNYMCPLNSDGYPSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGV LSSRIEVALRPSASTQALSSSVSVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDPNTYFIVGTAMVY PEPKQGRIVVFQYGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKELRTECNHYNNIMALYLKTK GDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLFVCQQHLQEVGLFHLGEF VNVFCHGSLVTQGSVLFGTVNGMIGLVTSLSESWYNLLLDMQNRLNKVIKSVGKIEHSFWRSFHTETEPATGFID GDLIESFLDISRPKMQEVVATADDLIKVVEELTRIH >sp|3I7N| MSYNYVVTAQKPTAVNGCVTGHFTSAEDLNLLIAKNTRLEIYVVTAEGLRPVKEVGMYGKIAVMELFRPKGESKD LLFILTAKYNACILEYKQSGESIDIITRAHGNVQDRIGRPSETGIIGIIDPECRMIGLRLYDGLFKVIPLDRDNK ELKAFNIRLEELHVIDVKFLYGCQAPTICFVYQDPQGRHVKTYEVSLREKEFNKGPWKQENVEAEASMVIAVPEP FGGAIIIGQESITYHNGDKYLAIAPPIIKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKEEQMDGTVTLKDL RVELLGETSIAECLTYLDNGVVFVGSRLGDSQLVKLNVDSNEQGSYVVAMETFTNLGPIVDMCVVDLERQGQGQL VTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPNRETYDTLVLSFVGQTRVLMLNGEEVEETELMG FVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVASCNSSQVVVAVGRALYYLQIHPQE LRQISHTEMEHEVACLDITPLGDSNGLSPLCAIGLWTDISARILKLPSFELLHKEMLGGEIIPRSILMTTFESSH YLLCALGDGALFYFGLNIETGLLSDRKKVTLGTQPTVLRTFRSLSTTNVFACSDRPTVIYSSNHKLVFSNVNLKE VNYMCPLNSDGYPDSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGVLSSRIEVQDTSGGTT ALRPSASTQALSSSVSSSKLFSSGEEVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDPNTYFIVGTA MVYPEEAEPKQGRIVVFQYSDGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKDVRTECNHYNNI MALYLKTKGDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLFVCQKDSAAT TDEERQHLQEVGLFHLGEFVNVFCHGSLVMQPTQGSVLFGTVNGMIGLVTSLSESWYNLLLDMQNRLNKVIKSVG KIEHSFWRSFHTERKTEPATGFIDGDLIESFLDISRPKMQEVVANREATADDLIKVVEELTRIH >sp|3I89| MSYNYVVTAQKPTAVNGCVTGHFTSAEDLNLLIAKNTRLEIYVVTAEGLRPVKEVGMYGKIAVMELFRPKGESKD LLFILTAKYNACILEYKQSGESIDIITRAHGNVQDRIGRPSETGIIGIIDPECRMIGLRLYDGLFKVIPLDRDNK ELKAFNIRLEELHVIDVKFLYGCQAPTICFVYQDPQGRHVKTYEVSLREKEFNKGPWKQENVEAEASMVIAVPEP FGGAIIIGQESITYHNGDKYLAIAPPIIKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKEEQMDGTVTLKDL RVELLGETSIAECLTYLDNGVVFVGSRLGDSQLVKLNVDSNEQGSYVVAMETFTNLGPIVDMCVVDLERQGQGQL VTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPNRETYDTLVLSFVGQTRVLMLNGEEVEETELMG FVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVASCNSSQVVVAVGRALYYLQIHPQE LRQISHTEMEHEVACLDITPLGDSNGLSPLCAIGLWTDISARILKLPSFELLHKEMLGGEIIPRSILMTTFESSH YLLCALGDGALFYFGLNIETGLLSDRKKVTLGTQPTVLRTFRSLSTTNVFACSDRPTVIYSSNHKLVFSNVNLKE VNYMCPLNSDGYPDSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGVLSSRIEVQDTSGGTT ALRPSASTQALSSSVSSSKLFSSGEEVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDPNTYFIVGTA MVYPEEAEPKQGRIVVFQYSDGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKDVRTECNHYNNI MALYLKTKGDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLFVCQKDSAAT TDEERQHLQEVGLFHLGEFVNVFCHGSLVMQPTQGSVLFGTVNGMIGLVTSLSESWYNLLLDMQNRLNKVIKSVG KIEHSFWRSFHTERKTEPATGFIDGDLIESFLDISRPKMQEVVANREATADDLIKVVEELTRIH >sp|3I8C| MSYNYVVTAQKPTAVNGCVTGHFTSAEDLNLLIAKNTRLEIYVVTAEGLRPVKEVGMYGKIAVMELFRPKGESKD LLFILTAKYNACILEYKQSGESIDIITRAHGNVQDRIGRPSETGIIGIIDPECRMIGLRLYDGLFKVIPLDRDNK ELKAFNIRLEELHVIDVKFLYGCQAPTICFVYQDPQGRHVKTYEVSLREKEFNKGPWKQENVEAEASMVIAVPEP FGGAIIIGQESITYHNGDKYLAIAPPIIKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKEEQMDGTVTLKDL RVELLGETSIAECLTYLDNGVVFVGSRLGDSQLVKLNVDSNEQGSYVVAMETFTNLGPIVDMCVVDLERQGQGQL VTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPNRETYDTLVLSFVGQTRVLMLNGEEVEETELMG FVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVASCNSSQVVVAVGRALYYLQIHPQE LRQISHTEMEHEVACLDITPLGDSNGLSPLCAIGLWTDISARILKLPSFELLHKEMLGGEIIPRSILMTTFESSH YLLCALGDGALFYFGLNIETGLLSDRKKVTLGTQPTVLRTFRSLSTTNVFACSDRPTVIYSSNHKLVFSNVNLKE VNYMCPLNSDGYPDSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGVLSSRIEVQDTSGGTT ALRPSASTQALSSSVSSSKLFSSGEEVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDPNTYFIVGTA MVYPEEAEPKQGRIVVFQYSDGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKDVRTECNHYNNI MALYLKTKGDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLFVCQKDSAAT TDEERQHLQEVGLFHLGEFVNVFCHGSLVMQPTQGSVLFGTVNGMIGLVTSLSESWYNLLLDMQNRLNKVIKSVG KIEHSFWRSFHTERKTEPATGFIDGDLIESFLDISRPKMQEVVANREATADDLIKVVEELTRIH >sp|Q10426|RIK1_SCHPO Chromatin modification-related protein rik1 OS=Schizosaccharomyces pombe GN=rik1 PE=1 SV=2 MALCVHSFWATAVDTATSCHFISSENCLVLLQALKINIYLCSEVHGLQFFTSIPLFSTVK HIRPYRPPGLDRDYLFVVLNDDTYFSIYWDEDYQKVIVDHPPVRYRVTFPWNRNAKSYCL VDLRMRAIFLSIDEISMICIRILSAEERLKTGRSIDSGFPFSFPVHLIYDMCILNDSSTP TLVVLHSDGLDCYVTAFLLDLSSKSLGKGIRLFERVKPSMIMPFGKRGLLVFESLFIHCM YRGNFVTINGPCTTYMHWTPLKGQKMHYIVCDTNGYLFGVYSSILGKNKWSLVMERLPIP PFDFITSLNSIHEGLLFIGSKNSESKLINLSTLKDVDSIPNLGPIHDLLVLKNDIEKSFL VCAGTPRNASLIYFQHALKLDILGQTKISGILRAMVLPSYPEHKLFLGFPSETVAFNIKE DFQLELDPSLSTKERTIALSGTNGEFVQVTSTFLCIYDSAKRSRLVYIEKITNAACYQEY SAIVINGTALAIFKKDTEVARKVFESEISCLDFSAQFQIGVGFWSKQVMILTFSDNSSIS CAFQTNVPSLPRNIILEGVGVDRNLLLVSSGSGEFKSYVLFKNNLVFSETKHFGTTPVSF RRFTMNIGTYIICNNDCPHMVYGFNGALCYMPLSMPQSYDVCQFRDNSGKDFLISVSLGG LKFLQLNPLPELTPRKVLLEHVPLQAIIFQNKLLLRTLENRYEDYESYKENYHLELVDSY DDNSFRVFSFTENERCEKVLKINESSLLVGTSIIEQDKLVPVNGRLILLEFEKELQSLKV VSSMVLSAAVIDLGVYNDRYIVAFGQQVAIVKLTEERLMIDSRISLGSIVLQLIVEGNEI AIADSIGRFTIMYFDGQKFIVVARYLFGENIVKAALYEGTVYIIATNSGLLKLLRYNKDA KNFNDRFICESVYHLHDKVSKFQNFPITNTNSFLEPKMLFATEIGAIGSIVSLKDKELEL EELTRKIRKLKFSYLSSMDYESIEADLISPVPFIDGDLVIDVKRWASSELFRLCRSVEHR ESLNSYQKVQALLEEIQSLC
# Illustrates the SALIGN multiple structure/sequence alignment from modeller import * log.verbose() env = environ() env.io.atom_files_directory = './:../atom_files/' aln = alignment(env) for (code, chain) in (('3EI4', 'A'), ('2B5L', 'A'), ('3E0C', 'A'), ('3I7N', 'A'), ('3I89', 'A'), ('3I8C', 'A')): mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain)) aln.append_model(mdl, atom_files=code, align_codes=code+chain) for (weights, write_fit, whole) in (((1., 0., 0., 0., 1., 0.), False, True), ((1., 0.5, 1., 1., 1., 0.), False, True), ((1., 1., 1., 1., 1., 0.), True, False)): aln.salign(rms_cutoff=3.5, normalize_pp_scores=False, rr_file='$(LIB)/as1.sim.mat', overhang=30, gap_penalties_1d=(-450, -50), gap_penalties_3d=(0, 3), gap_gap_score=0, gap_residue_score=0, dendrogram_file='SALIGN1.tree', alignment_type='tree', # If 'progresive', the tree is not # computed and all structues will be # aligned sequentially to the first feature_weights=weights, # For a multiple sequence alignment only # the first feature needs to be non-zero improve_alignment=True, fit=True, write_fit=write_fit, write_whole_pdb=whole, output='ALIGNMENT QUALITY') aln.write(file='SALIGN1.pap', alignment_format='PAP') aln.write(file='SALIGN1.ali', alignment_format='PIR') aln.salign(rms_cutoff=1.0, normalize_pp_scores=False, rr_file='$(LIB)/as1.sim.mat', overhang=30, gap_penalties_1d=(-450, -50), gap_penalties_3d=(0, 3), gap_gap_score=0, gap_residue_score=0, dendrogram_file='dendogram.tree', alignment_type='progressive', feature_weights=[0]*6, improve_alignment=False, fit=False, write_fit=True, write_whole_pdb=False, output='QUALITY')
Attachment:
SALIGN1.pap
Description: Binary data