Cyanovirin-N (CV-N) was originally isolated from Nostoc ellipsosporum. It was identified in a screening effort as a highly potent inhibitor of diverse laboratory adapted strains and clinical isolates of HIV-1, HIV-2 and SIV. Subsequently, the structure of CV-N was solved, first by NMR spectroscopy and later by X-ray crystallography at a resolution of 1.5Å. The two structures are very similar. The CN-V monomer consists of two similar domains with 32% sequence identity to each other. In the crystal structure, the domains are connected by a flexible linker region, forming a dimer by inter-molecular domain swapping.
Recently, work was initiated to solve the monomer structure of a CN-V variant with circularly permuted domains (cpCN-V) [76]. Assuming that the overall structure does not change significantly, the new protein can be modeled by comparative modeling. An initial coarse model is built by using the following alignment file in the PAP format (file `circ.pap').
_aln.pos 10 20 30 40 50 60 2ezm LGKFSQTCYNSAIQGSVL-TSTCERTNGGYNTSSIDLNSVIENVDGSLKWQPSNFIETCR cpCN-V LGKFIETCRNTQLAGSSELAAECKTRAQQFVSTKINLDDHIANIDGTLKWQPSNFSQTCY ** ****** _aln.pos 70 80 90 100 2ezm NTQLAGSSELAAECKTRAQQFVSTKINLDDHIANIDGTLKYE cpCN-V NSAIQGSVL-TSTCERTNGGYNTSSIDLNSVIENVDGSLKYE **
Next, the new linker loop and the short N- and C-termini are refined by ab initio loop modeling. The selected segments that are subjected to loop modeling are indicated by stars in the alignment above. The loop modeling script is as follows (file `loop.top').
INCLUDE SET SEQUENCE = 'cpCN-V' SET LOOP_MODEL = 'cpCN-V.pdb' SET LOOP_STARTING_MODEL = 1 SET LOOP_ENDING_MODEL = 200 CALL ROUTINE = 'loop' SUBROUTINE ROUTINE = 'select_loop_atoms' PICK_ATOMS SELECTION_SEGMENT = '0:' '3:', SELECTION_STATUS = 'initialize' PICK_ATOMS SELECTION_SEGMENT = '99:' '100:', SELECTION_STATUS = 'add' PICK_ATOMS SELECTION_SEGMENT = '49:' '54:', SELECTION_STATUS = 'add' RETURN END_SUBROUTINE
SEQUENCE defines the name of the model. LOOP_MODEL defines the name of the input coordinate file containing the cpCN-V model whose loops need to be refined. LOOP_STARTING_MODEL and LOOP_ENDING_MODEL define how many final loop models are calculated (in this case, 200). The subroutine `select_loop_atoms' selects regions of the model for loop modeling. Two arguments are submitted to the PICK_ATOMS command. SELECTION_SEGMENT defines the starting and ending residues of the loop. SELECTION_STATUS defines whether or not the program initializes the selection or adds the current loop to the previously defined set of loops. In this case, three loops are selected and optimized simultaneously. The filenames of output models with refined loops have the `.BL' extension to distinuish them from the default file naming convention of the regular models (`.B'). For instance, the first generated loop model file is `cpCN-V.BL00010001'.
Although the linker segment is only six residues long, it is not known whether or not some of the preceding and subsequent residues undergo conformational changes in the new construct. To investigate this question, we gradually extended the length of the modeled linker region from 6 to 12 residues. For this purpose, one needs to modify only the selection routine in the script above.
The model with the lowest energy score of the 200 generated models was selected for each linker length from 6 to 12 residues. The superposition of the best models of varying length showed a dominant cluster of conformations, indicating that the modeling of the linker region is not limited by conformational changes in the immediately preceding or subsequent parts of the sequence (Figure 5). The final comparative model with the optimized linker and terminal segments was used to refine the structure of cpCN-V against NMR dipolar coupling data. A good agreement between the experimental values and those calculated from the model confirmed that the fold of cpCN-V is similar to that of the wild type and that the model may facilitate characterization of the structure and dynamics of cpCV-N [76].
Acknowledgments
We are grateful to all the members of our research group for many discussions about comparative protein structure modeling. AF was a Burroughs Wellcome Fund Postdoctoral Fellow and is a Charles Revson Foundation Postdoctoral Fellow. AS is an Irma T. Hirschl Trust Career Scientist. Research was supported by NIH/GM 54762, Merck Genome Research Award (AS), and Mathers Foundation. This review is based on [77,7,6].
MODELLER is available freely to academic users at http://guitar.rockefeller.edu/modeller/modeller.html. It runs on many UNIX systems, including PCs running LINUX. All the sample files shown in this review are available at http://guitar.rockefeller.edu/modeller/methenz/. MODELLER, with a graphical interface, is also available as part of QUANTA, INSIGHTII and GENEEXPLORER (Accelrys Inc., San Diego, e-mail: dje@accelrys.com).