[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[modeller_usage] Fwd: SALIGN



> HI,
> I performed multiple sequence alignment using salign. When I analysed the result output (.pap output), there was some improper alignment at 400th position. Because of that, entire alignment after that error was improper. How can I improve the alignment and what is the reason for such mistakes.
> 
> I have attached the file containing sequence of the protein, salign code and the pap output file.
>sp|3EI4|
MSYNYVVTAQKPTAVNGCVTGHFTSAEDLNLLIAKNTRLEIYVVTAEGLRPVKEVGMYGKIAVMELFRPKGESKD
LLFILTAKYNACILEYKQSGESIDIITRAHGNVQDRIGRPSETGIIGIIDPECRMIGLRLYDGLFKVIPLDRDNK
ELKAFNIRLEELHVIDVKFLYGCQAPTICFVYQDPQGRHVKTYEVSLREKEFNKGPWKQENVEAEASMVIAVPEP
FGGAIIIGQESITYHNGDKYLAIAPPIIKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKEEQMDGTVTLKDL
RVELLGETSIAECLTYLDNGVVFVGSRLGDSQLVKLNVDSNEQGSYVVAMETFTNLGPIVDMCVVDLERQGQGQL
VTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPNRETYDTLVLSFVGQTRVLMLNGEEVEETELMG
FVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVASCNSSQVVVAVGRALYYLQIHPQE
LRQISHTEMEHEVACLDITPLGDSNGLSPLCAIGLWTDISARILKLPSFELLHKEMLGGEIIPRSILMTTFESSH
YLLCALGDGALFYFGLNIETGLLSDRKKVTLGTQPTVLRTFRSLSTTNVFACSDRPTVIYSSNHKLVFSNVNLKE
VNYMCPLNSDGYPDSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGVLSSRIEVQDTSGGTT
ALRPSASTQALSSSVSSSKLFSSSTAPHETSFGEEVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDP
NTYFIVGTAMVYPEEAEPKQGRIVVFQYSDGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKDVR
TECNHYNNIMALYLKTKGDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLF
VCQKDSAATTDEERQHLQEVGLFHLGEFVNVFCHGSLVMQNLGETSTPTQGSVLFGTVNGMIGLVTSLSESWYNL
LLDMQNRLNKVIKSVGKIEHSFWRSFHTERKTEPATGFIDGDLIESFLDISRPKMQEVVANLQYDDGSGMKREAT
ADDLIKVVEELTRIH

>sp|2B5L|
MSYNYVVTAQKPTAVNGCVTGHFTSAEDLNLLIAKNTRLEIYVVTAEGLRPVKEVGMYGKIAVMELFRPKGESKD
LLFILTAKYNACILEYKQSGESIDIITRAHGNVQDRIGRPSETGIIGIIDPECRMIGLRLYDGLFKVIPLDRDNK
ELKAFNIRLEELHVIDVKFLYGCQAPTICFVYQDPQGRHVKTYEVSLREKEFNKGPWKQENVEAEASMVIAVPEP
FGGAIIIGQESITYHNGDKYLAIAPPIIKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKEEQMDGTVTLKDL
RVELLGETSIAECLTYLDNGVVFVGSRLGDSQLVKLNVDSNEQGSYVVAMETFTNLGPIVDMCVVDLERQGQGQL
VTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPNRETYDTLVLSFVGQTRVLMLNGEEVEETELMG
FVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVASCNSSQVVVAVGRALYYLQIHPQE
LRQISHTEMEHEVACLDITPLGDSNGLSPLCAIGLWTDISARILKLPSFELLHKEMLGGEIIPRSILMTTFESSH
YLLCALGDGALFYFGLNIETGLLSDRKKVTLGTQPTVLRTFRSLSTTNVFACSDRPTVIYSSNHKLVFSNVNLKE
VNYMCPLNSDGYPDSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGVLSSRIEVQDTSGGTT
ALRPSASTQALSSSVSSSKLFSSTSFGEEVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDPNTYFIV
GTAMVYPEEAEPKQGRIVVFQYSDGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKDVRTECNHY
NNIMALYLKTKGDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLFVCQKDS
AATTDEERQHLQEVGLFHLGEFVNVFCHGSLVMQNLGSTPTQGSVLFGTVNGMIGLVTSLSESWYNLLLDMQNRL
NKVIKSVGKIEHSFWRSFHTERKTEPATGFIDGDLIESFLDISRPKMQEVVANLQYDDGSGMKREATADDLIKVV
EELTRIH

>sp|3E0C|
SYNYVVTAQKPTAVNGCVTGHFTEDLNLLIAKNTRLEIYVVTLRPVKEVGMYGKIAVMELFRPKGKDLLFILTAK
YNACILEYKSIDIITRAHGNVQDRGIIGIIDPECRMIGLRLYDGLFKVIPLDNKELKAFNIRLEELHVIDVKFLY
GCQAPTICFVYQDPRHVKTYEVSLREKEFNKGWKQNVEAEASMVIAVPEPFGGAIIIGQESITYHNGYLAIAPPI
IKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKDGTVTLKDLRVELLGETSIAECLTYLDGVVFVGSRLGDSQ
LVKLNVYVVAMETFTNLGPIVDMCVVDQGQLVTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPET
DDTLVLSFVGQTRVLMLETELMGFVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVAS
CNSSQVVVAVGRALYYLQIHPQELRQISHTEMEHEVACLDITPLGLSPLCAIGLWTDISARILKLPSFELLHKEM
LGGEIIPRSILMTTFESSHYLLCALGDGALFYFGLNIETGLLSDKKVTLGTQPTVLRTFRSSTTNVFACSDRPTV
IYSNHKLVFSNVNLKEVNYMCPLNSDGYPSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGV
LSSRIEVALRPSASTQALSSSVSVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDPNTYFIVGTAMVY
PEPKQGRIVVFQYGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKELRTECNHYNNIMALYLKTK
GDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLFVCQQHLQEVGLFHLGEF
VNVFCHGSLVTQGSVLFGTVNGMIGLVTSLSESWYNLLLDMQNRLNKVIKSVGKIEHSFWRSFHTETEPATGFID
GDLIESFLDISRPKMQEVVATADDLIKVVEELTRIH


>sp|3I7N|
MSYNYVVTAQKPTAVNGCVTGHFTSAEDLNLLIAKNTRLEIYVVTAEGLRPVKEVGMYGKIAVMELFRPKGESKD
LLFILTAKYNACILEYKQSGESIDIITRAHGNVQDRIGRPSETGIIGIIDPECRMIGLRLYDGLFKVIPLDRDNK
ELKAFNIRLEELHVIDVKFLYGCQAPTICFVYQDPQGRHVKTYEVSLREKEFNKGPWKQENVEAEASMVIAVPEP
FGGAIIIGQESITYHNGDKYLAIAPPIIKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKEEQMDGTVTLKDL
RVELLGETSIAECLTYLDNGVVFVGSRLGDSQLVKLNVDSNEQGSYVVAMETFTNLGPIVDMCVVDLERQGQGQL
VTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPNRETYDTLVLSFVGQTRVLMLNGEEVEETELMG
FVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVASCNSSQVVVAVGRALYYLQIHPQE
LRQISHTEMEHEVACLDITPLGDSNGLSPLCAIGLWTDISARILKLPSFELLHKEMLGGEIIPRSILMTTFESSH
YLLCALGDGALFYFGLNIETGLLSDRKKVTLGTQPTVLRTFRSLSTTNVFACSDRPTVIYSSNHKLVFSNVNLKE
VNYMCPLNSDGYPDSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGVLSSRIEVQDTSGGTT
ALRPSASTQALSSSVSSSKLFSSGEEVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDPNTYFIVGTA
MVYPEEAEPKQGRIVVFQYSDGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKDVRTECNHYNNI
MALYLKTKGDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLFVCQKDSAAT
TDEERQHLQEVGLFHLGEFVNVFCHGSLVMQPTQGSVLFGTVNGMIGLVTSLSESWYNLLLDMQNRLNKVIKSVG
KIEHSFWRSFHTERKTEPATGFIDGDLIESFLDISRPKMQEVVANREATADDLIKVVEELTRIH


>sp|3I89|
MSYNYVVTAQKPTAVNGCVTGHFTSAEDLNLLIAKNTRLEIYVVTAEGLRPVKEVGMYGKIAVMELFRPKGESKD
LLFILTAKYNACILEYKQSGESIDIITRAHGNVQDRIGRPSETGIIGIIDPECRMIGLRLYDGLFKVIPLDRDNK
ELKAFNIRLEELHVIDVKFLYGCQAPTICFVYQDPQGRHVKTYEVSLREKEFNKGPWKQENVEAEASMVIAVPEP
FGGAIIIGQESITYHNGDKYLAIAPPIIKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKEEQMDGTVTLKDL
RVELLGETSIAECLTYLDNGVVFVGSRLGDSQLVKLNVDSNEQGSYVVAMETFTNLGPIVDMCVVDLERQGQGQL
VTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPNRETYDTLVLSFVGQTRVLMLNGEEVEETELMG
FVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVASCNSSQVVVAVGRALYYLQIHPQE
LRQISHTEMEHEVACLDITPLGDSNGLSPLCAIGLWTDISARILKLPSFELLHKEMLGGEIIPRSILMTTFESSH
YLLCALGDGALFYFGLNIETGLLSDRKKVTLGTQPTVLRTFRSLSTTNVFACSDRPTVIYSSNHKLVFSNVNLKE
VNYMCPLNSDGYPDSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGVLSSRIEVQDTSGGTT
ALRPSASTQALSSSVSSSKLFSSGEEVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDPNTYFIVGTA
MVYPEEAEPKQGRIVVFQYSDGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKDVRTECNHYNNI
MALYLKTKGDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLFVCQKDSAAT
TDEERQHLQEVGLFHLGEFVNVFCHGSLVMQPTQGSVLFGTVNGMIGLVTSLSESWYNLLLDMQNRLNKVIKSVG
KIEHSFWRSFHTERKTEPATGFIDGDLIESFLDISRPKMQEVVANREATADDLIKVVEELTRIH

>sp|3I8C|
MSYNYVVTAQKPTAVNGCVTGHFTSAEDLNLLIAKNTRLEIYVVTAEGLRPVKEVGMYGKIAVMELFRPKGESKD
LLFILTAKYNACILEYKQSGESIDIITRAHGNVQDRIGRPSETGIIGIIDPECRMIGLRLYDGLFKVIPLDRDNK
ELKAFNIRLEELHVIDVKFLYGCQAPTICFVYQDPQGRHVKTYEVSLREKEFNKGPWKQENVEAEASMVIAVPEP
FGGAIIIGQESITYHNGDKYLAIAPPIIKQSTIVCHNRVDPNGSRYLLGDMEGRLFMLLLEKEEQMDGTVTLKDL
RVELLGETSIAECLTYLDNGVVFVGSRLGDSQLVKLNVDSNEQGSYVVAMETFTNLGPIVDMCVVDLERQGQGQL
VTCSGAFKEGSLRIIRNGIGIHEHASIDLPGIKGLWPLRSDPNRETYDTLVLSFVGQTRVLMLNGEEVEETELMG
FVDDQQTFFCGNVAHQQLIQITSASVRLVSQEPKALVSEWKEPQAKNISVASCNSSQVVVAVGRALYYLQIHPQE
LRQISHTEMEHEVACLDITPLGDSNGLSPLCAIGLWTDISARILKLPSFELLHKEMLGGEIIPRSILMTTFESSH
YLLCALGDGALFYFGLNIETGLLSDRKKVTLGTQPTVLRTFRSLSTTNVFACSDRPTVIYSSNHKLVFSNVNLKE
VNYMCPLNSDGYPDSLALANNSTLTIGTIDEIQKLHIRTVPLYESPRKICYQEVSQCFGVLSSRIEVQDTSGGTT
ALRPSASTQALSSSVSSSKLFSSGEEVEVHNLLIIDQHTFEVLHAHQFLQNEYALSLVSCKLGKDPNTYFIVGTA
MVYPEEAEPKQGRIVVFQYSDGKLQTVAEKEVKGAVYSMVEFNGKLLASINSTVRLYEWTTEKDVRTECNHYNNI
MALYLKTKGDFILVGDLMRSVLLLAYKPMEGNFEEIARDFNPNWMSAVEILDDDNFLGAENAFNLFVCQKDSAAT
TDEERQHLQEVGLFHLGEFVNVFCHGSLVMQPTQGSVLFGTVNGMIGLVTSLSESWYNLLLDMQNRLNKVIKSVG
KIEHSFWRSFHTERKTEPATGFIDGDLIESFLDISRPKMQEVVANREATADDLIKVVEELTRIH

>sp|Q10426|RIK1_SCHPO Chromatin modification-related protein rik1 OS=Schizosaccharomyces pombe GN=rik1 PE=1 SV=2
MALCVHSFWATAVDTATSCHFISSENCLVLLQALKINIYLCSEVHGLQFFTSIPLFSTVK
HIRPYRPPGLDRDYLFVVLNDDTYFSIYWDEDYQKVIVDHPPVRYRVTFPWNRNAKSYCL
VDLRMRAIFLSIDEISMICIRILSAEERLKTGRSIDSGFPFSFPVHLIYDMCILNDSSTP
TLVVLHSDGLDCYVTAFLLDLSSKSLGKGIRLFERVKPSMIMPFGKRGLLVFESLFIHCM
YRGNFVTINGPCTTYMHWTPLKGQKMHYIVCDTNGYLFGVYSSILGKNKWSLVMERLPIP
PFDFITSLNSIHEGLLFIGSKNSESKLINLSTLKDVDSIPNLGPIHDLLVLKNDIEKSFL
VCAGTPRNASLIYFQHALKLDILGQTKISGILRAMVLPSYPEHKLFLGFPSETVAFNIKE
DFQLELDPSLSTKERTIALSGTNGEFVQVTSTFLCIYDSAKRSRLVYIEKITNAACYQEY
SAIVINGTALAIFKKDTEVARKVFESEISCLDFSAQFQIGVGFWSKQVMILTFSDNSSIS
CAFQTNVPSLPRNIILEGVGVDRNLLLVSSGSGEFKSYVLFKNNLVFSETKHFGTTPVSF
RRFTMNIGTYIICNNDCPHMVYGFNGALCYMPLSMPQSYDVCQFRDNSGKDFLISVSLGG
LKFLQLNPLPELTPRKVLLEHVPLQAIIFQNKLLLRTLENRYEDYESYKENYHLELVDSY
DDNSFRVFSFTENERCEKVLKINESSLLVGTSIIEQDKLVPVNGRLILLEFEKELQSLKV
VSSMVLSAAVIDLGVYNDRYIVAFGQQVAIVKLTEERLMIDSRISLGSIVLQLIVEGNEI
AIADSIGRFTIMYFDGQKFIVVARYLFGENIVKAALYEGTVYIIATNSGLLKLLRYNKDA
KNFNDRFICESVYHLHDKVSKFQNFPITNTNSFLEPKMLFATEIGAIGSIVSLKDKELEL
EELTRKIRKLKFSYLSSMDYESIEADLISPVPFIDGDLVIDVKRWASSELFRLCRSVEHR
ESLNSYQKVQALLEEIQSLC
# Illustrates the SALIGN multiple structure/sequence alignment

from modeller import *

log.verbose()
env = environ()
env.io.atom_files_directory = './:../atom_files/'

aln = alignment(env)
for (code, chain) in (('3EI4', 'A'), ('2B5L', 'A'), ('3E0C', 'A'), ('3I7N', 'A'), ('3I89', 'A'), ('3I8C', 'A')):
    mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
    aln.append_model(mdl, atom_files=code, align_codes=code+chain)

for (weights, write_fit, whole) in (((1., 0., 0., 0., 1., 0.), False, True),
                                    ((1., 0.5, 1., 1., 1., 0.), False, True),
                                    ((1., 1., 1., 1., 1., 0.), True, False)):
    aln.salign(rms_cutoff=3.5, normalize_pp_scores=False,
               rr_file='$(LIB)/as1.sim.mat', overhang=30,
               gap_penalties_1d=(-450, -50),
               gap_penalties_3d=(0, 3), gap_gap_score=0, gap_residue_score=0,
               dendrogram_file='SALIGN1.tree',
               alignment_type='tree', # If 'progresive', the tree is not
                                      # computed and all structues will be
                                      # aligned sequentially to the first
               feature_weights=weights, # For a multiple sequence alignment only
                                        # the first feature needs to be non-zero
               improve_alignment=True, fit=True, write_fit=write_fit,
               write_whole_pdb=whole, output='ALIGNMENT QUALITY')

aln.write(file='SALIGN1.pap', alignment_format='PAP')
aln.write(file='SALIGN1.ali', alignment_format='PIR')

aln.salign(rms_cutoff=1.0, normalize_pp_scores=False,
           rr_file='$(LIB)/as1.sim.mat', overhang=30,
           gap_penalties_1d=(-450, -50), gap_penalties_3d=(0, 3),
           gap_gap_score=0, gap_residue_score=0, dendrogram_file='dendogram.tree',
           alignment_type='progressive', feature_weights=[0]*6,
           improve_alignment=False, fit=False, write_fit=True,
           write_whole_pdb=False, output='QUALITY')

Attachment: SALIGN1.pap
Description: Binary data