Reconstruction of CA position only files with all-atom template
Hi all,
Here is my task: I have some CA position only files of a protein of interested that underlying some important biological pathways. We also have the protein full Cryo-EM structures (pdb:8io4, the exact same protein as well as sequence as the CA only files). Then I would like to fill the keep the CA position only files' CA position (don't move at all) then reconstruct the detail like side chain/secondary structure information from template. So this task is not homology modeling but reconstruction. After getting the reconstructed files, I would like to use these into MD simulations.
I searched from previous mail list, and I found one did this task but not based on template. https://www.salilab.org/archives/modeller_usage/2008/msg00285.html I tried with
"
env = Environ()
# Load the CA-only seed model seed_model = Model(env, file=seed_file)
# Load the template structure (with full atomistic detail) template_model = Model(env, file=template_file)
# Create an alignment between the seed and itself (1:1 alignment) aln = Alignment(env) aln.append_model(seed_model, align_codes='seed') aln.append_model(seed_model, align_codes='seed')
# Write the alignment for debugging purposes aln.write(file='alignment.ali', alignment_format='PIR')
# Define a custom model class to fix the Cα positions class MyModel(automodel): def select_atoms(self): # Select all atoms except the Cα atoms to allow refinement of everything but Cα s = selection(self) return s - s.only_atom_types('CA')
# Set up environment for refinement env.edat.nonbonded_sel_atoms = 2 # Disable interactions between selected and unselected atoms
a = MyModel(env, alnfile = aln, # Alignment file created earlier knowns = 'seed', # Seed model (CA-only) sequence = 'seed') # Target is also the seed (1:1 alignment)
a.starting_model = 1 a.ending_model = 1
a.make() "
But this code didn't use the side chain atom from the Cryo-EM structure, and just use CA position to reconstruct the detail. It seems the reconstruction by this code is not reliable enough that cause infinite energy error in MD simulation, it may indicate some overlap or incorrect side-chain orientations, or improper hydrogen placements. Therefore, I would need to try to reconstruct with the side chain information from template Cryo-EM structure's pdb.
I tried some methods that use the template Cryo-EM structure, but it seems we just do the homology modeling but NOT reconstruction. The production files we get is almost same as the template without make use of the CA position from CA only files. The code I used like :
"
# Load the CA-only seed model seed_model = Model(env, file=seed_file) # Load the template structure (with full atomistic detail) template_model = Model(env, file=template_file) # Create an alignment between the seed and the template aln = Alignment(env) aln.append_model(template_model, align_codes='template') aln.append_model(seed_model, align_codes='seed') # Perform the alignment aln.align()
# Now defining a custom AutoModel class to apply Cα restraints class MyModel(AutoModel): def special_restraints(self, aln): # Restrain all Cα atoms based on the seed model rsr = self.restraints atmsel = selection(self).only_atom_types('CA') # Apply restraints to the selected Cα atoms for atom in atmsel: rsr.add(forms.Gaussian(group=physical.xy_distance, feature=features.Distance(atom, atom), mean=0.0, # No change in CA positions stdev=0.01))
# Instantiate the custom model class a = MyModel(env, alnfile=aln, knowns='template', sequence='seed') a.starting_model = 1 a.ending_model = 1 a.make()
"
I think this task can be done since I found a paper (https://doi.org/10.7554/eLife.68369) that do the similar task, they claim to use Modeller to perform this: "First, side chain atoms from the template X-ray structure (PDB ID 4HFI) were added to each model, followed by a cycle of refinement with all Ca atoms restrained. Restraints on Ca atoms were then substituted with restraints on backbone hydrogen bonds, taken from helix and sheet annotations in the template PDB file, for another cycle of refinement to ensure proper secondary structure."
Can anyone give some suggestions about this reconstruction task? Thanks.
Best,
Jingkai
On 11/12/24 1:55 PM, jingkaizeng via modeller_usage wrote: > Here is my task: I have some CA position only files of a protein of > interested that underlying some important biological pathways. We also > have the protein full Cryo-EM structures (pdb:8io4, the exact same > protein as well as sequence as the CA only files). Then I would like to > fill the keep the CA position only files' CA position (don't move at > all) then reconstruct the detail like side chain/secondary structure > information from template.
I'm not aware of anybody using Modeller to do this, and it might not make sense - for example if the two structures are very different - but it should be possible.
> I searched from previous mail list, and I found one did this task but > not based on template. https://www.salilab.org/archives/ > modeller_usage/2008/msg00285.html
That solution uses a CA-only structure as a 100% identical template to build a comparative model. Since there are no sidechains in the template, Modeller will construct those from internal coordinates and use the CHARMM forcefield to optimize them. So that's not what you want because it's not going to use your Cryo-EM structure.
> I tried some methods that use the template Cryo-EM structure, but it > seems we just do the homology modeling but NOT reconstruction. The > production files we get is almost same as the template without make use > of the CA position from CA only files. The code I used like :
From reading your script it looks like you are trying to build a model using the Cryo-EM structure as a template, restraining the CA atoms to their "seed" values. That might work but you have a bug in your script:
> # Apply restraints to the selected Cα atoms > for atom in atmsel: > rsr.add(forms.Gaussian(group=physical.xy_distance, > feature=features.Distance(atom, atom), > mean=0.0, # No change in CA positions > stdev=0.01))
features.Distance restrains the distance between two atoms. The distance between an atom and itself is always going to be zero, so this restraint will have no effect. That's why your output models don't look any different. If you do want to use this approach you can do it in Modeller using absolute position restraints, see https://salilab.org/modeller/10.6/manual/node109.html
You would want something like
for atom in atmsel: rsr.add(forms.Gaussian( group=physical.xy_distance, feature=features.XCoordinate(atom), mean=<x coordinate of atom in seed>, stdev=0.01))
Of course you would need to get the desired coordinates of each atom from the seed structure, and repeat this for the y and z coordinates.
Note though that like any restraint, your "CA position restraints" can be violated, so the CA atoms won't be exactly where they are in the seed structure. If you don't want them to move at all, you can achieve that by building a model using the Cryo-EM structure as a template, using your seed structure plus reconstructed side chains (from your first attempt) as an initial model (as per https://salilab.org/modeller/10.6/manual/node27.html) and then overriding select_atoms to only move non-CA atoms as per https://salilab.org/modeller/10.6/manual/node23.html. The resulting model will probably contain a lot of violations though (since the distance restraints will be based on the Cryo-EM structure but you are forcing it to use the seed structure CA coordinates). These could probably be lessened by removing some or all of the mainchain atoms from your Cryo-EM template (so that Modeller does not generate restraints on them). But likely the resulting structures will need further refinement.
Ben Webb, Modeller Caretaker
participants (2)
-
jingkaizeng
-
Modeller Caretaker