How does Modeller handle multiple structures?
Hello everyone,
I am very new to modelling and working a target sequence that has just less than 27% sequence identity with another 10 proteins that have crystal structures. My target and templates are all myosin proteins, but from 2 different classes, since there are no crystal structures available for my target in its same class.
I have 2 concerns/queries:
1. How will modeller handle the 10 template sequences (which themselves show a significant amount of variation at certain key positions). More specifically, a particular area of my seq alignment may read like this:
template 1 xxxxxGxxxxxxxx template 2 xxxxxTxxxxxxxx template 3 xxxxxGxxxxxxxx template 4 xxxxxGxxxxxxxx template 5 xxxxxGxxxxxxxx template 6 xxxxxGxxxxxxxx template 7 xxxxxGxxxxxxxx target xxxxxTxxxxxxxx
In simple terms, would modeller in this case use the coord of T in the template 2 sequence because it matches the target at that location, or will it use G because it is predominant in the majority of the structures?
2. The 10 template structures I am using are themselves in different states, for example, partially closed conformation, open, transition, inhibitor bound, etc. Is this beneficial to building my model or more detrimental? I am assuming the former based on the logic that the more info I am giving to modeller, the better my model should be. However, perhaps there is something about the way modeller works which I have not grasped that means its actually doing more harm adding the structures in different states?
Thanks very much in advance,
-Zoe
Zoe Katsimitsoulia wrote: > 1. How will modeller handle the 10 template sequences (which themselves > show a significant amount of variation at certain key positions). More > specifically, a particular area of my seq alignment may read like this: > > template 1 xxxxxGxxxxxxxx > template 2 xxxxxTxxxxxxxx > template 3 xxxxxGxxxxxxxx > template 4 xxxxxGxxxxxxxx > template 5 xxxxxGxxxxxxxx > template 6 xxxxxGxxxxxxxx > template 7 xxxxxGxxxxxxxx > target xxxxxTxxxxxxxx > > In simple terms, would modeller in this case use the coord of T in the > template 2 sequence because it matches the target at that location, or > will it use G because it is predominant in the majority of the > structures?
Check the Modeller papers. Modeller doesn't use the coordinates directly, but other properties of the templates, and it uses a weighted sum over all templates. For instance, the Ca-Ca distances of the target would in this case be modeled by a sum of gaussians, where the peak positions correspond to the observed distances in the templates and the weights to the template weights. The templates are weighted by local sequence similarity, which would probably favor the 'T' sequences in this case (but I can't tell for sure because the neighboring residues are considered too, which you haven't shown). You should look at the .rsr file that Modeller produces to see which restraints it's using (although it's not that easy to read).
> 2. The 10 template structures I am using are themselves in different > states, for example, partially closed conformation, open, transition, > inhibitor bound, etc. Is this beneficial to building my model or more > detrimental? I am assuming the former based on the logic that the more > info I am giving to modeller, the better my model should be. > However, perhaps there is something about the way modeller works which I > have not grasped that means its actually doing more harm adding the > structures in different states?
The target will be constrained to look as much like a weighted sum of the templates as possible, so you probably want to have the templates in the same state as your desired target state.
Ben Webb, Modeller Caretaker
On 2/10/06, Modeller Caretaker modeller-care@salilab.org wrote: > > Zoe Katsimitsoulia wrote: > > 1. How will modeller handle the 10 template sequences (which themselves > > show a significant amount of variation at certain key positions). More > > specifically, a particular area of my seq alignment may read like this: > > > > template 1 xxxxxGxxxxxxxx > > template 2 xxxxxTxxxxxxxx > > template 3 xxxxxGxxxxxxxx > > template 4 xxxxxGxxxxxxxx > > template 5 xxxxxGxxxxxxxx > > template 6 xxxxxGxxxxxxxx > > template 7 xxxxxGxxxxxxxx > > target xxxxxTxxxxxxxx > > > > In simple terms, would modeller in this case use the coord of T in the > > template 2 sequence because it matches the target at that location, or > > will it use G because it is predominant in the majority of the > > structures? > > Check the Modeller papers. Modeller doesn't use the coordinates > directly, but other properties of the templates, and it uses a weighted > sum over all templates. For instance, the Ca-Ca distances of the target > would in this case be modeled by a sum of gaussians, where the peak > positions correspond to the observed distances in the templates and the > weights to the template weights. The templates are weighted by local > sequence similarity, which would probably favor the 'T' sequences in > this case (but I can't tell for sure because the neighboring residues > are considered too, which you haven't shown). You should look at the > .rsr file that Modeller produces to see which restraints it's using > (although it's not that easy to read). > > > 2. The 10 template structures I am using are themselves in different > > states, for example, partially closed conformation, open, transition, > > inhibitor bound, etc. Is this beneficial to building my model or more > > detrimental? I am assuming the former based on the logic that the more > > info I am giving to modeller, the better my model should be. > > However, perhaps there is something about the way modeller works which I > > have not grasped that means its actually doing more harm adding the > > structures in different states? > > The target will be constrained to look as much like a weighted sum of > the templates as possible, so you probably want to have the templates in > the same state as your desired target state.
I would like to add my little experience here (please correct if I am wrong): in case of large number of templates, there are large number of restraints, which may be difficult to satisfy at a time by optimizer, so obj. function value may go high, which is not desirable. Also, I observed that if templates are structurally similar and approximately 'identical' (seq. identity) to target sequence, then results are better, probably in that case, better of the templates is chosen for the concerned part of the sequence for its modelling.
br,
Vivek Sharma.
Ben Webb, Modeller Caretaker > -- > modeller-care@salilab.org http://www.salilab.org/modeller/ > Modeller mail list: http://salilab.org/mailman/listinfo/modeller_usage > _______________________________________________ > modeller_usage mailing list > modeller_usage@salilab.org > http://salilab.org/mailman/listinfo/modeller_usage >
participants (3)
-
Modeller Caretaker
-
Vivek Sharma
-
Zoe Katsimitsoulia