Procter is right. BUILD_PROFILE can be seen as a command that supersedes SEQUENCE_SEARCH, to identify potential templates and get a reliable alignment for modeling.
Eswar.
On Oct 4, 2004, at 6:37 AM, J B Procter wrote:
> > It is possible to build new sequence databases for modeller - and, as > Eswar said, there are two relevant commands. Writing a script to do > this > is unavoidable, though, unless the caretaker has one ready for everyone > to download! > > As a very quick fix, you could get the current pdb sequence list from > here : > ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt > > Then, follow the script in > modeller7v7/examples/commands/build_profile.top, which shows how you > can read in a simple FASTA sequence flatfile database, like the one > from > the pdb website, and then use it to align against your sequence in > order > to build a sequence profile (and by that, retrieve all homologous > sequences from the PDB). > > To do the job properly, you need to apply the make_chains command > (modeller7v7/examples/commands/make_chains.top) to generate the extra > information that is written into the PIR information fields, and used > by > modeller fetch the correct PDB file for each sequence in the database. > > If you have a mirror of the PDB, then this script (for unix) might > work: > > #!/bin/bash > # makes chain records and places them pdb_seq.chn in the current > working > # directory. > # you need to change this to point to your local copy of the PDB, > > PDBDIR="/projects/biodata/pdb/data/structures/all/pdb" > > for p in `ls -1 $PDBDIR` > do > y=`basename $p .ent.Z`; > if [[ $p != $y ]]; then > echo READ_MODEL FILE = '$PDBDIR/$p' > make_chains_.top > echo MAKE_CHAINS MINIMAL_CHAIN_LENGTH = 30, \ > MINIMAL_RESOLUTION = 2.0, MINIMAL_STDRES = 30, \ > CHOP_NONSTD_TERMINII = on, \ > STRUCTURE_TYPES ='structureN structureX' >> make_chains_.top > mod7v7 make_chains_.top > cat ${y/pdb/./}.*.chn >> pdb_seq.chn > rm ${y/pdb/./}.*.chn > fi > done; > > After that, which will take some time to run, pdb_seq.chn will contain > a > subset of all the PDB chains, in a similar form to the CHAINS_all.seq > file. > > You should, then, be able to read this new database in, apply SEQFILTER > (see the example/command/seqfilter.top) , and write out the list of > chain representatives (at 95%, for instance). For best use, you should > rewrite the database (via READ_SEQUENCE_DB and WRITE_SEQUENCE_DB) in > binary format and limit it to just the representative sequences > generated by SEQFILTER (by specifying the CHAINS_LIST option on > READ_SEQUENCE_DB). > > > Enjoy! > j. > > _______________________________________________________________________ > Dr JB Procter:Biomolecular Modelling at ZBH - Center for Bioinformatics > Hamburg http://www.zbh.uni-hamburg.de/staff.php > _______________________________________________ > modeller_usage mailing list > modeller_usage@salilab.org > http://salilab.org/mailman/listinfo/modeller_usage