Charlie Allerston wrote: > This is what I am not getting. Where to put the ligand in the > alignment, regardless of what character to use. > Take this for instance. trying to model something from the template 1VDC > http://www.rcsb.org/pdb/explore.do?structureId=1VDC > Looking at the PDB file it has 316 residues then it had a molecule of > FAD tacked on at a position designated 400.
...followed by a bunch of waters from residue 500 to 945, followed by an SO4 at 440. (There are actually more than 316 residues, too.)
Modeller always reads PDB residues in order, so if the ligand is at the end of the PDB file, you put the blk residue at the end of the sequence in the alignment too. If you turn env.io.hetatm on to True, all of the HETATMs in the model (at least in the range specified by model_segment) will be read, so they must all be listed in your alignment.
> So when I align this to some target like below (cropped). > > > --------------------------MNGLETHNTRLCIVGSGPAAHTAAIYAARAELKP > LLFEGWMANDIAPGGQLTTTTDVENFPGFPEGILGVELTDKFRKQSERFGTTIFTETVTK > VDFSSKPFKLFTDS---KAILADAVILAIGAVAKRLSFVGSGEVLGGFWNRGISACAVCD > GAAPIFRNKPLAVIGGGDSAMEEANFLTKYGSKVYIIHRRDAFRASKIMQQRALSNPKID > VIWNSSVVEAYGDGERDVLGGLKVKNVVTGDVSDLKVSGLFFAIGHEPATKFLDGGVELD > SDGYVVTKPGTTQTSVPGVFAAGDVQDKKYRQAITAAGTGCMAALDAEHYLQEIGSQEGK > SD- > * >> P1;fake1 > > DASGLSVAAAATLSQKSTPYYQSEIHTIGKRRMHSKVVIIGSGPAAHTAAIYLARAELKP > VLYEGFMANGVAAGGQLTTTTEVENFPGFPEAVTGQELMDKMRAQSERFGTVIVSETVGK > LDLSKRPFEYSTEWSPDTVMTADAVILATGASARRLGLPGED----KYWQNGISACAVCD > GAVPIFRNKPLVVIGGGDSAAEEAIFLTKYGSHVTVLVRRDKLRASSIMARRLLAN---- > ------------------------------------------------------------ > -------------KKVTGLFAAGDVQDKRYRQAITSAGTGCMAALDAEKYLEELEDEQAD > GKL > * > > Where should I stick the fad? At the end?
That's not a complete alignment file, so I can't tell which is supposed to be the 1vdc sequence. But the true 1vdc sequence can be obtained by Modeller using the script at http://salilab.org/modeller/FAQ.html#18 (merely modified by setting env.io.hetatm = True):
>P1;1vdc structureX:1vdc: 1 : :+324 : :undefined:undefined:-1.00:-1.00 LETHNTRLCIVGSGPAAHTAAIYAARAELKPLLFEGWMANDIAPGGQLTTTTDVENFPGFPEGILGVELTDKFRK QSERFGTTIFTETVTKVDFSSKPFKLFTDSKAILADAVILAIGAVAKRLSFVGSGEVLGGFWNRGISACAVCDGA APIFRNKPLAVIGGGDSAMEEANFLTKYGSKVYIIHRRDAFRASKIMQQRALSNPKIDVIWNSSVVEAYGDGERD VLGGLKVKNVVTGDVSDLKVSGLFFAIGHEPATKFLDGGVELDSDGYVVTKPGTTQTSVPGVFAAGDVQDKKYRQ AITAAGTGCMAALDAEHYLQEI.#*
Since the FAD comes after the regular amino acids in the PDB file, the blk residue (.) comes immediately after the regular sequence. You can also see a # residue there - that's the SO4. You can see which HETATMs Modeller has codes for by looking at modlib/restyp.lib, but if in doubt use blk (.) since that'll match everything.
You will often see a chain break (/) immediately preceding blk residues in alignments. That's only necessary if you want to force the ligands to have a different chain ID to the amino acids. (If you want them in the same chain, leave out the chain break.)
> How many blk characters > should I tack on? 1 because there is only one molecule?
HETATM residues are treated in exactly the same way as ATOM residues, so one character per PDB residue.
What you get in your model, of course, depends on what is in your target sequence. For example, you may want to build a model containing FAD but not the SO4. In this case, you would align a blk residue in the model to the corresponding FAD blk residue in the template, but align a gap to the SO4.
Ben Webb, Modeller Caretaker