Re: [modeller_usage] PDB updates for Modeller

4 Oct 2004


      On Oct 3, 2004, at 7:47 AM, Bruno Afonso wrote:
> Eswar Narayanan wrote:
>>>
>>> Since I know this PDB is probably a good model and it won't come up 
>>> in seq_search I was wondering how I could manually update 
>>> CHAINS_all.seq or create my own sequence database.
>> The latest release of MODELLER (version 7v7, released last month) has 
>> a new command called SEQFILTER that can be used to cluster PDB 
>> sequences. You can use MAKE_CHAINS (also in the latest release) to 
>> collect the PDB chains prior to running SEQFILTER.
>
> There was a mistake in my previous e-mail. :) The PDB sequence is 
> missing, which is *bad*, not good. I'm sorry to ask this questions, 
> but I'm still puzzled as to how to deal with this:
If you know exactly what your template(s) is(are) going to be, you do 
not have to use SEQUENCE_SEARCH to "identify" your template. You can 
use any of the alignment commands (ALIGN, ALIGN2D etc) to create your 
alignment and model your sequence based on that alignment.
>
> 1) What's the criteria for make chains_all.seq? I ask this because 
> clearly not all of PDB is there :) and there are sequences there with 
> resolutions as high as 5.0 angstroms...
One usually wants to use a non-redundant version of PDB to search for 
templates. One way is to first select sequences of all X-ray structures 
that are solved at a resolution better than 3.5A, that are longer than 
30aa, have no more than 10 non-standard residues, have at least 30 
standard residues. These can all be specified as options to 
MAKE_CHAINS. You can then cluster these sequences using SEQFILTER to 
remove redundancies with a sequence identity threshold (usually set at 
30% or 95%).
Ben has put these files on the web at  
http://salilab.org/modeller/supplemental.html. These are the 
representative sequences derived PDB files at 30% and 95% sequence 
identity. All x-ray and NMR PDB chains, with no limits on resolution, 
that are at least 30aa long, have more than 30 standard residues and 
not more than 10 non-standard residues were use to get these files. 
This is just the output of SEQFILTER on last weeks' release (09-28-04) 
of PDB.
>
> 2) Can't I make a chains_all.seq alike with MY criteria without making 
> my own script? ie, is there a "right way"(TM) to do it?
See the comments above.
>
> 3) I can use MAKE_CHAINS and then load the .chn as a database, but 
> that involves having me first finding the good PDBs that aren't on the 
> modeller's DB, which is kind of misses the whole point. I was using 
> modeller to try to find the good ones in the first place.
>
> Thanks for the tip on seqfilter, but my problem was the sequence 
> missing in the modeller's default database in the first place ;-)
The reviews listed on the modeller web-site 
(http://salilab.org/modeller/documentation.html) will help you 
understand the process of identifying a useful template for modelling.
---
Eswar Narayanan, Ph.D
Mission Bay Genentech Hall
600 16th Street, Suite N474Q
University of California - San Francisco
San Francisco, CA 94143-2240 (CA 94158 for courier)
Tel +1 (415) 514-4233; Fax +1 (415) 514-4231
http://www.salilab.org/~eashwar