Hey Modeller users,
I'm trying to model 400+ proteins based on ~100 templates. I have an alignmentfile of 1500+ sequences comprising of the templates, targets and others. ClustalW was used to align the sequences.
I have a few problems. - The sequences in the alignment file do not match the aminoacids present in the pdb files. _Generally_ the pdb files contain more residues than specified in the aligned sequence. Therefore I have to either concatonate the pdb files or specify the residues in the appropriate residues in the alignment file. - The ID codes in the alignment file do not match the atom file names. - There is no "second" line in each entry in the alignment file.
Although all this can be done manually, I can't help but wonder if there is a way to automate/expidate the process. A paper published by Sanhez and Sali (1998) mentioned perl script that allowed for rapid progress through the various steps involved with modelling. Suggestions will be most appreciated.
Some of the pdb files are complexes. If it can be avoided I'd prefer not to use these structures . However if I do decide to use some of them, I plan to minimise the E via MD (cns) of the protein (minus the ligand) before using it as a template. What are people's thoughts about this?
Many thanks
Hello Cvetan,
from my own experience I know that Modeller is quite picky about correct sequence (as it should).
Anyhow, I would not use ClustalW alignments in the first place. When you model a particular protein using several templates, you are *much* better off using structural alignments instead of sequence based alignments alone. You could use for example T-COFFEE together with SAP (or maybe Fugue, but I have no experience with it). You would then use your PDB files from the start in the alignment process, not unrelated Genbank sequences, and Modeller would thus find all residues it needs in the alignment. Alternatively, there are structrual alignments readily available at Homstrad for a large number of proteins.
Note also that Modeller comes with a file modlib/CHAINS_all.seq. If you took the sequences from this file in your ClustalW alignments it should work with Modeller.
Hope this helps,
Kind regards,
Karsten.
> I'm trying to model 400+ proteins based on ~100 templates. I have an > alignmentfile of 1500+ sequences comprising of the templates, targets and > others. ClustalW was used to align the sequences. > > I have a few problems. > - The sequences in the alignment file do not match the aminoacids present > in the pdb files. _Generally_ the pdb files contain more residues than > specified in the aligned sequence. Therefore I have to either concatonate > the pdb files or specify the residues in the appropriate residues in the > alignment file. - The ID codes in the alignment file do not match the atom > file names. - There is no "second" line in each entry in the alignment > file. > > Although all this can be done manually, I can't help but wonder if there is > a way to automate/expidate the process. A paper published by Sanhez and > Sali (1998) mentioned perl script that allowed for rapid progress through > the various steps involved with modelling. Suggestions will be most > appreciated. > > Some of the pdb files are complexes. If it can be avoided I'd prefer not > to use these structures . However if I do decide to use some of them, I > plan to minimise the E via MD (cns) of the protein (minus the ligand) > before using it as a template. What are people's thoughts about this? > > Many thanks
Thanks Karsten,
Your message was helpfull. I think I'll try T-COFEE/SAP along with the ALIGN2D routine. I've developed a few programs/scripts that helped me correct sequence ID codes to match atom files. The TOP scripts also came in very handy in preparing these files.
Cvetan
Quoting Karsten Suhre Karsten.Suhre@igs.cnrs-mrs.fr:
> Hello Cvetan, > > from my own experience I know that Modeller is quite picky about correct > sequence (as it should). > > Anyhow, I would not use ClustalW alignments in the first place. When you > model > a particular protein using several templates, you are *much* better off using > > structural alignments instead of sequence based alignments alone. You could > use for example T-COFFEE together with SAP (or maybe Fugue, but I have no > experience with it). You would then use your PDB files from the start in the > > alignment process, not unrelated Genbank sequences, and Modeller would thus > find all residues it needs in the alignment. Alternatively, there are > structrual alignments readily available at Homstrad for a large number of > proteins. > > Note also that Modeller comes with a file modlib/CHAINS_all.seq. If you took > > the sequences from this file in your ClustalW alignments it should work with > > Modeller. > > Hope this helps, > > Kind regards, > > Karsten. > > > I'm trying to model 400+ proteins based on ~100 templates. I have an > > alignmentfile of 1500+ sequences comprising of the templates, targets and > > others. ClustalW was used to align the sequences. > > > > I have a few problems. > > - The sequences in the alignment file do not match the aminoacids present > > in the pdb files. _Generally_ the pdb files contain more residues than > > specified in the aligned sequence. Therefore I have to either concatonate > > the pdb files or specify the residues in the appropriate residues in the > > alignment file. - The ID codes in the alignment file do not match the atom > > file names. - There is no "second" line in each entry in the alignment > > file. > > > > Although all this can be done manually, I can't help but wonder if there > is > > a way to automate/expidate the process. A paper published by Sanhez and > > Sali (1998) mentioned perl script that allowed for rapid progress through > > the various steps involved with modelling. Suggestions will be most > > appreciated. > > > > Some of the pdb files are complexes. If it can be avoided I'd prefer not > > to use these structures . However if I do decide to use some of them, I > > plan to minimise the E via MD (cns) of the protein (minus the ligand) > > before using it as a template. What are people's thoughts about this? > > > > Many thanks >