Hello,
I have used modeller to attempt to model a loop region in a tetrameric structure.
To speed up computational time I've started with just the dimer ( which I think also makes sense biologically ) and ran the following script with an .ali file I created. The region of interest is a disordered loop AA~399-417. Following this I used the evaluate model script to pick the best model, which I then used as the template input for the loop.py script. I run the loop.py script at many iterations and then when it finishes I take the top 20 models (out of 1000) by DOPE score and attempt to analyze them. Any advice on how I could improve my method would be great.. I really am at the edge of my understanding here.
I've edited the name of the protein. I can include it as soon as I can verify that this can stay anonymous. Thanks!!!
Scripts +files :
*First )using a template pdb and .ali file I made myself of the Dimer (BIO_headers in chimera and deletion of one dimer of the tetramer) * *Input : *
>P1;ABCD structureX:ABCD.pdb: 3 :A:+990:B:::-1.00:-1.00 --TSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFNMDYTLAVYKSPEYESLGFELTVE RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEL AQELHVWTDKSSLFEELQSLDIFLAS----------------SIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES/--TSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFNMDYTLAVYKSPEYESLGFELTVE RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEL AQELHVWTDKSSLFEELQSLDIFLAS----------------SIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES*
>P1;X sequence:ABCD: : : : ::: 0.00: 0.00 MSTSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFDMDYTLAVYKSPEYESLGFELTVE RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEF AQELHVWTDKSSLFEELQSLDIFLAELYKHLDSSSNERPDISSIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES/MSTSWSDRLQNAADMPANMDKHALKKYRREAYHRVFVNRSLAMEKIKCFGFDMDYTLAVYKSPEYESLGFELTVE RLVSIGYPQELLSFAYDSTFPTRGLVFDTLYGNLLKVDAYGNLLVCAHGFNFIRGPETREQYPNKFIQRDDTERF YILNTLFNLPETYLLACLVDFFTNCPRYTSCETGFKDGDLFMSYRSMFQDVRDAVDWVHYKGSLKEKTVENLEKY VVKDGKLPLLLSRMKEVGKVFLATNSDYKYTDKIMTYLFDFPHGPKPGSSHRPWQSYFDLILVDARKPLFFGEGT VLRQVDTKTGKLKIGTYTGPLQHGIVYSGGSSDTICDLLGAKGKDILYIGDHIFGDILKSKKRQGWRTFLVIPEF AQELHVWTDKSSLFEELQSLDIFLAELYKHLDSSSNERPDISSIQRRIKKVTHDMDMCYGMMGSLFRSGSRQTLF ASQVMRYADLYAASFINLLYYPFSYLFRAAHVLMPHES*
# Homology modeling by the automodel class # # Demonstrates how to build multi-chain models, and symmetry restraints # from modeller import * from modeller.automodel import * # Load the automodel class
log.verbose()
# Override the 'special_restraints' and 'user_after_single_model' methods: class MyModel(automodel): def special_restraints(self, aln): # Constrain the A, B, C and D chains to be identical s1 = selection(self.chains['A']).only_atom_types('CA') s2 = selection(self.chains['B']).only_atom_types('CA') self.restraints.symmetry.append(symmetry(s1, s2, 1))
def user_after_single_model(self): # Report on symmetry violations greater than 1A after building # each model: self.restraints.symmetry.report(1)
env = environ() # directories for input atom files env.io.atom_files_directory = ['.', '../atom_files']
# Be sure to use 'MyModel' rather than 'automodel' here! a = MyModel(env, alnfile = 'WT_dimer_NoCterm.ali' , # alignment filename knowns = 'ABCD', # codes of the templates sequence = 'ABCD') # code of the target
a.starting_model= 1 # index of the first model a.ending_model = 20 # index of the last model # (determines how many models to calculate) a.make() # do homology modeling # # class MyModel(automodel):
*2nd) Run eval model and pick top model to use as input for the loop.py script below --->*
# Loop refinement of an existing model from modeller import * from modeller.automodel import *
log.verbose() env = environ()
# directories for input atom files env.io.atom_files_directory = './:../atom_files'
# Create a new class based on 'loopmodel' so that we can redefine # select_loop_atoms (necessary) class MyLoop(loopmodel): # This routine picks the residues to be refined by loop modeling def select_loop_atoms(self): # 10 residue insertion return selection(self.residue_range('399:A', '417:A'))
m = MyLoop(env, inimodel='ABCD_Top_Model.pdb', # initial model of the target sequence='ABCD') # code of the target
m.loop.starting_model= 1 # index of the first loop model m.loop.ending_model = 1000 # index of the last loop model m.loop.md_level = refine.slow # loop refinement method; this yields # models quickly but of low quality; # use refine.slow for better models
m.make()
*3rd) Run model_energies script*
Then I rank them using this perl one liner ( just cats and displays the dope and the file ) can run in any unix terminal.
cat model*.log |perl -lane' print if /DOPE\ score/ || /ABCD.BL/'| perl -pe 's/ +/\t/g'|perl -pe 's/:\t//g' |cut -f3,4|perl -pe 's/\n/\t/'|perl -pe 's/(\d)\t(ABCD)/$1\n$2/g'|sort -k2,2n|head -20
*4) Take top 20 models and analyze *
Any help with where I'm doing things wrong or how I can improve would be awesome. How many models in loop.py would you recommend to get a good answer? Is there away to interpret the models created as a whole ....for example say something like "60% of the time residue A is next to residue B and 20% of the time its over in this pocket , therefor".... etc
My major aim is to compare different ligand bound crystal states, or mutation states, and show that the models created are different between them. Then use this to design wet lab experiments to test.
Thanks again