[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [modeller_usage] PDB updates for Modeller




On Oct 3, 2004, at 7:47 AM, Bruno Afonso wrote:

Eswar Narayanan wrote:

Since I know this PDB is probably a good model and it won't come up in seq_search I was wondering how I could manually update CHAINS_all.seq or create my own sequence database.
The latest release of MODELLER (version 7v7, released last month) has a new command called SEQFILTER that can be used to cluster PDB sequences. You can use MAKE_CHAINS (also in the latest release) to collect the PDB chains prior to running SEQFILTER.

There was a mistake in my previous e-mail. :) The PDB sequence is missing, which is *bad*, not good. I'm sorry to ask this questions, but I'm still puzzled as to how to deal with this:

If you know exactly what your template(s) is(are) going to be, you do not have to use SEQUENCE_SEARCH to "identify" your template. You can use any of the alignment commands (ALIGN, ALIGN2D etc) to create your alignment and model your sequence based on that alignment.


1) What's the criteria for make chains_all.seq? I ask this because clearly not all of PDB is there :) and there are sequences there with resolutions as high as 5.0 angstroms...

One usually wants to use a non-redundant version of PDB to search for templates. One way is to first select sequences of all X-ray structures that are solved at a resolution better than 3.5A, that are longer than 30aa, have no more than 10 non-standard residues, have at least 30 standard residues. These can all be specified as options to MAKE_CHAINS. You can then cluster these sequences using SEQFILTER to remove redundancies with a sequence identity threshold (usually set at 30% or 95%).

Ben has put these files on the web at http://salilab.org/modeller/supplemental.html. These are the representative sequences derived PDB files at 30% and 95% sequence identity. All x-ray and NMR PDB chains, with no limits on resolution, that are at least 30aa long, have more than 30 standard residues and not more than 10 non-standard residues were use to get these files. This is just the output of SEQFILTER on last weeks' release (09-28-04) of PDB.


2) Can't I make a chains_all.seq alike with MY criteria without making my own script? ie, is there a "right way"(TM) to do it?

See the comments above.


3) I can use MAKE_CHAINS and then load the .chn as a database, but that involves having me first finding the good PDBs that aren't on the modeller's DB, which is kind of misses the whole point. I was using modeller to try to find the good ones in the first place.

Thanks for the tip on seqfilter, but my problem was the sequence missing in the modeller's default database in the first place ;-)

The reviews listed on the modeller web-site (http://salilab.org/modeller/documentation.html) will help you understand the process of identifying a useful template for modelling.

---
Eswar Narayanan, Ph.D
Mission Bay Genentech Hall
600 16th Street, Suite N474Q
University of California - San Francisco
San Francisco, CA 94143-2240 (CA 94158 for courier)
Tel +1 (415) 514-4233; Fax +1 (415) 514-4231
http://www.salilab.org/~eashwar