[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [modeller_usage] PDB updates for Modeller



On Oct 3, 2004, at 7:47 AM, Bruno Afonso wrote:

Eswar Narayanan wrote:
Since I know this PDB is probably a good model and it won't come up 
in seq_search I was wondering how I could manually update 
CHAINS_all.seq or create my own sequence database.
The latest release of MODELLER (version 7v7, released last month) has a new command called SEQFILTER that can be used to cluster PDB sequences. You can use MAKE_CHAINS (also in the latest release) to collect the PDB chains prior to running SEQFILTER.
There was a mistake in my previous e-mail. :) The PDB sequence is 
missing, which is *bad*, not good. I'm sorry to ask this questions, 
but I'm still puzzled as to how to deal with this:
If you know exactly what your template(s) is(are) going to be, you do 
not have to use SEQUENCE_SEARCH to "identify" your template. You can 
use any of the alignment commands (ALIGN, ALIGN2D etc) to create your 
alignment and model your sequence based on that alignment.
1) What's the criteria for make chains_all.seq? I ask this because 
clearly not all of PDB is there :) and there are sequences there with 
resolutions as high as 5.0 angstroms...
One usually wants to use a non-redundant version of PDB to search for 
templates. One way is to first select sequences of all X-ray structures 
that are solved at a resolution better than 3.5A, that are longer than 
30aa, have no more than 10 non-standard residues, have at least 30 
standard residues. These can all be specified as options to 
MAKE_CHAINS. You can then cluster these sequences using SEQFILTER to 
remove redundancies with a sequence identity threshold (usually set at 
30% or 95%).
Ben has put these files on the web at  
http://salilab.org/modeller/supplemental.html. These are the 
representative sequences derived PDB files at 30% and 95% sequence 
identity. All x-ray and NMR PDB chains, with no limits on resolution, 
that are at least 30aa long, have more than 30 standard residues and 
not more than 10 non-standard residues were use to get these files. 
This is just the output of SEQFILTER on last weeks' release (09-28-04) 
of PDB.
2) Can't I make a chains_all.seq alike with MY criteria without making 
my own script? ie, is there a "right way"(TM) to do it?
See the comments above.

3) I can use MAKE_CHAINS and then load the .chn as a database, but 
that involves having me first finding the good PDBs that aren't on the 
modeller's DB, which is kind of misses the whole point. I was using 
modeller to try to find the good ones in the first place.
Thanks for the tip on seqfilter, but my problem was the sequence 
missing in the modeller's default database in the first place ;-)
The reviews listed on the modeller web-site 
(http://salilab.org/modeller/documentation.html) will help you 
understand the process of identifying a useful template for modelling.
---
Eswar Narayanan, Ph.D
Mission Bay Genentech Hall
600 16th Street, Suite N474Q
University of California - San Francisco
San Francisco, CA 94143-2240 (CA 94158 for courier)
Tel +1 (415) 514-4233; Fax +1 (415) 514-4231
http://www.salilab.org/~eashwar