Dear Modellers,
I have been running through the examples in the Fiser and Sali paper concerning the modeling of the loop in cpCN-V (circularly permuted cyanovirin). I used the given scripts, modifying it to allow the modeling of different linker lengths, from 6 to 12 residues, as suggested in the paper. 200 final models of each linker were obtained, the model with the lowest energy score (objective function) being selected in each case. This gave the following table of results:
Length (aa) Residues Objective Function Model # File Name 6 49-54 21.6408 119 cpCN-V.BL06 7 48-54 12.1955 49 cpCN-V.BL07 7 49-55 33.0729 118 cpCN-V.BL07a 8 47-54 7.5401 153 cpCN-V.BL08 8 49-56 35.8842 152 cpCN-V.BL08a 8 48-55 31.0163 117 cpCN-V.BL08b 9 46-54 12.1449 95 cpCN-V.BL09 9 49-57 24.4132 39 cpCN-V.BL09a 9 47-55 29.1883 145 cpCN-V.BL09b 9 48-56 27.4328 115 cpCN-V.BL09c 10 46-55 28.5179 184 cpCN-V.BL10 10 47-56 29.004 199 cpCN-V.BL10a 10 48-57 31.8589 25 cpCN-V.BL10b 11 46-56 18.1581 109 cpCN-V.BL11 11 47-57 22.4939 189 cpCN-V.BL11a 12 46-57 27.0164 198 cpCN-V.BL12
My question is, can I take the 'best' loop model as being that with the lowest objective function (i.e. model cpCN-V.BL08, objective function 7.5401).
I did wonder how my results compared with those obtained by Fiser and Sali, but I have been unable to find this.
Cheers,
Alex Brown
PS. I'm running Modeller v6.2 on Mac OS X (Darwin). Each modeling run (200 loop models) took 6-10 hours !!!!
hi Alex,
a few points:
we used this approach to enforce our confidence in predicting of a 6 res segment and explore the possible uncertainty in the environment. I.e. the analysis was aiming to explore the question, how far one can extend selecting the loop environment and still predicting a similar conformation? since the conclusion was, that the shorter loops (6, 7N, 7C, 8N 8C .. )are returning the same conformation and at some loop length it got drastically worse (random) any of those shorter consistent loops will be suitable for a final model. Unfortunately the details of this extensive study are not published, just a short report about reinforcing our prediction with NMR data and vice versa for this special case.
More importantly: Modeller scores are -essentially- a cumulative summary of the restraints deviations in a probabilistic form. Conversely the scores depend on the set of restraints selected, which is directly correlated with the number of atoms involved. It means that scores can be compared if you select the same loop to optimize but not among different loops. I.e. the scores from separate modeling of 6,7,8.. res long loops are not comparable.
Once the best loop or a couple of best loops are selected energetically (i.e. modeller score-wise) from each loop prediction (same loop selected for optimization), these "best" loops from different sets of loop prediction (involving different loop lengths) were compared geometrically, (RMS-wise) for consistency.
another note: 6-10 hour sounds slow even if you use one computer only. We used SGI unix and red hat linux on pc, maybe the MAC OS is not optimal for this. Normally one loop optimization is around 1 minute or so, i.e. 200 runs is ~2+ hours. But it maybe normal given the compiler and code optimization differences among various platforms. However I would expect that a computational lab has more than 1 processor: if you submit them in parallel to several machine you can drastically reduce the time required for these calculation. E.g. in our case we can conduct a calculation like this in 1 minute. The only parameter you need to set differently in each loop top file is the random number seed RAND_SEED.
Andras
Alex Brown wrote: > > Dear Modellers, > > I have been running through the examples in the Fiser and Sali paper > concerning the modeling of the loop in cpCN-V (circularly permuted > cyanovirin). I used the given scripts, modifying it to allow the > modeling of different linker lengths, from 6 to 12 residues, as > suggested in the paper. 200 final models of each linker were obtained, > the model with the lowest energy score (objective function) being > selected in each case. This gave the following table of results: > > Length (aa) Residues Objective Function Model # File Name > 6 49-54 21.6408 > 119 cpCN-V.BL06 > 7 48-54 > 12.1955 49 cpCN-V.BL07 > 7 49-55 33.0729 > 118 cpCN-V.BL07a > 8 47-54 7.5401 > 153 cpCN-V.BL08 > 8 49-56 35.8842 > 152 cpCN-V.BL08a > 8 48-55 31.0163 > 117 cpCN-V.BL08b > 9 46-54 > 12.1449 95 cpCN-V.BL09 > 9 49-57 > 24.4132 39 cpCN-V.BL09a > 9 47-55 29.1883 > 145 cpCN-V.BL09b > 9 48-56 27.4328 > 115 cpCN-V.BL09c > 10 46-55 28.5179 > 184 cpCN-V.BL10 > 10 47-56 29.004 > 199 cpCN-V.BL10a > 10 48-57 31.8589 > 25 cpCN-V.BL10b > 11 46-56 18.1581 > 109 cpCN-V.BL11 > 11 47-57 22.4939 > 189 cpCN-V.BL11a > 12 46-57 27.0164 > 198 cpCN-V.BL12 > > My question is, can I take the 'best' loop model as being that with the > lowest objective function (i.e. model cpCN-V.BL08, objective function > 7.5401). > > I did wonder how my results compared with those obtained by Fiser and > Sali, but I have been unable to find this. > > Cheers, > > Alex Brown > > PS. I'm running Modeller v6.2 on Mac OS X (Darwin). Each modeling run > (200 loop models) took 6-10 hours !!!!
Hi, Andras
Thanks for the information.
I anticipated your answer, and did a cluster analysis on the loops RMDS. One of the loop structures (12-resiude) was way-out, but all the others seemed to fall into one of two possible clusters of stuctures. However, it may be that I was asking a different question.
One thing. I did the aligning and RMSD measurements using Swiss-PdbViwer (Deep View) as that allowed me to align the structures by selected residues and measure the RMSD of other selected residues. Can this be done using Modeller (Deep View has some limitations). MALIGN3D and COMPARE appear only to act on complete structures (but I may be wrong).
About the 6-10 hours running time on Mac OS X / Darwin. This may be due to the computational overhead of running OS X - perhaps if it were run in the console mode, things may be faster. However, an overnight run is not a problem. Luckily, the time factor is not a problem (at the moment) as I am doing all this in my spare time, trying to learn some protein modelling. I'll have to ease off a bit as I'm about to start a second year of a part-time MSc (Structural Molecular Biology) at Birkbeck. No more spare time.
Cheers,
Alex Brown
On Wednesday, November 6, 2002, at 05:45 pm, Andras Fiser wrote:
> > hi Alex, > > > a few points: > > we used this approach to enforce our confidence in predicting of a 6 res > segment and explore the possible uncertainty in the environment. I.e. > the analysis was aiming to explore the question, how far one can extend > selecting the loop environment and still predicting a similar > conformation? since the conclusion was, that the shorter loops (6, 7N, > 7C, 8N 8C .. )are returning the same conformation and at some loop > length it got drastically worse (random) any of those shorter > consistent loops will be suitable for a final model. > Unfortunately the details of this extensive study are not published, > just a short report about reinforcing our prediction with NMR data and > vice versa for this special case. > > More importantly: Modeller scores are -essentially- a cumulative summary > of the restraints deviations in a probabilistic form. Conversely the > scores depend on the set of restraints selected, which is directly > correlated with the number of atoms involved. It means that scores can > be compared if you select the same loop to optimize but not among > different loops. I.e. the scores from separate modeling of 6,7,8.. res > long loops are not comparable. > > Once the best loop or a couple of best loops are selected energetically > (i.e. modeller score-wise) from each loop prediction (same loop selected > for optimization), these "best" loops from different sets of loop > prediction (involving different loop lengths) were compared > geometrically, (RMS-wise) for consistency. > > > another note: 6-10 hour sounds slow even if you use one computer only. > We used SGI unix and red hat linux on pc, maybe the MAC OS is not > optimal for this. Normally one loop optimization is around 1 minute or > so, i.e. 200 runs is ~2+ hours. But it maybe normal given the compiler > and code optimization differences among various platforms. However I > would expect that a computational lab has more than 1 processor: if you > submit them in parallel to several machine you can drastically reduce > the time required for these calculation. E.g. in our case we can > conduct a calculation like this in 1 minute. The only parameter you need > to set differently in each loop top file is the random number seed > RAND_SEED. > > Andras > > > -- > , > Andras Fiser, PhD # phone: (212) 327 7216 > The Rockefeller University # fax: (212) 327 7540 > Box 270, 1230 York Avenue # e-mail:fisera@rockefeller.edu > New York, NY 10021-6399, USA # http://salilab.org/~andras >
Hi Alex
You can always select a subset of residues with SELECTION_SEGMENT (See PICK ATOMS), and the action will involve only the selected residues (MALIGN3d, SUPERPOSE etc). For an example, you can check the __loop.top routine itself that you have used.
best wishes,
Andras
Alex Brown wrote: > > Hi, Andras > > Thanks for the information. > > I anticipated your answer, and did a cluster analysis on the loops RMDS. > One of the loop structures (12-resiude) was way-out, but all the others > seemed to fall into one of two possible clusters of stuctures. However, > it may be that I was asking a different question. > > One thing. I did the aligning and RMSD measurements using Swiss-PdbViwer > (Deep View) as that allowed me to align the structures by selected > residues and measure the RMSD of other selected residues. Can this be > done using Modeller (Deep View has some limitations). MALIGN3D and > COMPARE appear only to act on complete structures (but I may be wrong). > > About the 6-10 hours running time on Mac OS X / Darwin. This may be due > to the computational overhead of running OS X - perhaps if it were run > in the console mode, things may be faster. However, an overnight run is > not a problem. Luckily, the time factor is not a problem (at the moment) > as I am doing all this in my spare time, trying to learn some protein > modelling. I'll have to ease off a bit as I'm about to start a second > year of a part-time MSc (Structural Molecular Biology) at Birkbeck. No > more spare time. > > Cheers, > > Alex Brown > > On Wednesday, November 6, 2002, at 05:45 pm, Andras Fiser wrote: > > > > > hi Alex, > > > > > > a few points: > > > > we used this approach to enforce our confidence in predicting of a 6 res > > segment and explore the possible uncertainty in the environment. I.e. > > the analysis was aiming to explore the question, how far one can extend > > selecting the loop environment and still predicting a similar > > conformation? since the conclusion was, that the shorter loops (6, 7N, > > 7C, 8N 8C .. )are returning the same conformation and at some loop > > length it got drastically worse (random) any of those shorter > > consistent loops will be suitable for a final model. > > Unfortunately the details of this extensive study are not published, > > just a short report about reinforcing our prediction with NMR data and > > vice versa for this special case. > > > > More importantly: Modeller scores are -essentially- a cumulative summary > > of the restraints deviations in a probabilistic form. Conversely the > > scores depend on the set of restraints selected, which is directly > > correlated with the number of atoms involved. It means that scores can > > be compared if you select the same loop to optimize but not among > > different loops. I.e. the scores from separate modeling of 6,7,8.. res > > long loops are not comparable. > > > > Once the best loop or a couple of best loops are selected energetically > > (i.e. modeller score-wise) from each loop prediction (same loop selected > > for optimization), these "best" loops from different sets of loop > > prediction (involving different loop lengths) were compared > > geometrically, (RMS-wise) for consistency. > > > > > > another note: 6-10 hour sounds slow even if you use one computer only. > > We used SGI unix and red hat linux on pc, maybe the MAC OS is not > > optimal for this. Normally one loop optimization is around 1 minute or > > so, i.e. 200 runs is ~2+ hours. But it maybe normal given the compiler > > and code optimization differences among various platforms. However I > > would expect that a computational lab has more than 1 processor: if you > > submit them in parallel to several machine you can drastically reduce > > the time required for these calculation. E.g. in our case we can > > conduct a calculation like this in 1 minute. The only parameter you need > > to set differently in each loop top file is the random number seed > > RAND_SEED. > > > > Andras
participants (2)
-
Alex Brown
-
Andras Fiser