On Oct 3, 2004, at 7:47 AM, Bruno Afonso wrote:
> Eswar Narayanan wrote: >>> >>> Since I know this PDB is probably a good model and it won't come up >>> in seq_search I was wondering how I could manually update >>> CHAINS_all.seq or create my own sequence database. >> The latest release of MODELLER (version 7v7, released last month) has >> a new command called SEQFILTER that can be used to cluster PDB >> sequences. You can use MAKE_CHAINS (also in the latest release) to >> collect the PDB chains prior to running SEQFILTER. > > There was a mistake in my previous e-mail. :) The PDB sequence is > missing, which is *bad*, not good. I'm sorry to ask this questions, > but I'm still puzzled as to how to deal with this:
If you know exactly what your template(s) is(are) going to be, you do not have to use SEQUENCE_SEARCH to "identify" your template. You can use any of the alignment commands (ALIGN, ALIGN2D etc) to create your alignment and model your sequence based on that alignment.
> > 1) What's the criteria for make chains_all.seq? I ask this because > clearly not all of PDB is there :) and there are sequences there with > resolutions as high as 5.0 angstroms...
One usually wants to use a non-redundant version of PDB to search for templates. One way is to first select sequences of all X-ray structures that are solved at a resolution better than 3.5A, that are longer than 30aa, have no more than 10 non-standard residues, have at least 30 standard residues. These can all be specified as options to MAKE_CHAINS. You can then cluster these sequences using SEQFILTER to remove redundancies with a sequence identity threshold (usually set at 30% or 95%).
Ben has put these files on the web at http://salilab.org/modeller/supplemental.html. These are the representative sequences derived PDB files at 30% and 95% sequence identity. All x-ray and NMR PDB chains, with no limits on resolution, that are at least 30aa long, have more than 30 standard residues and not more than 10 non-standard residues were use to get these files. This is just the output of SEQFILTER on last weeks' release (09-28-04) of PDB.
> > 2) Can't I make a chains_all.seq alike with MY criteria without making > my own script? ie, is there a "right way"(TM) to do it?
See the comments above.
> > 3) I can use MAKE_CHAINS and then load the .chn as a database, but > that involves having me first finding the good PDBs that aren't on the > modeller's DB, which is kind of misses the whole point. I was using > modeller to try to find the good ones in the first place. > > Thanks for the tip on seqfilter, but my problem was the sequence > missing in the modeller's default database in the first place ;-)
The reviews listed on the modeller web-site (http://salilab.org/modeller/documentation.html) will help you understand the process of identifying a useful template for modelling.
--- Eswar Narayanan, Ph.D Mission Bay Genentech Hall 600 16th Street, Suite N474Q University of California - San Francisco San Francisco, CA 94143-2240 (CA 94158 for courier) Tel +1 (415) 514-4233; Fax +1 (415) 514-4231 http://www.salilab.org/~eashwar