On 7/12/12 8:25 AM, Dimitrije Jevremovic wrote: > 1) When I downloaded and installed modPipe, I connected it according > to instructions to MODELLER and some other external tools such as > BLAST. First question is about external programs. Can all ten external > programs mentioned at the end of the introduction page at > http://salilab.org/modpipe/doc/intro.html be installed or just the > four external programs described with instructions at the page > http://salilab.org/modpipe/doc/install-ext.html (MODELLER, BLAST, > CD-HIT and PROCHECK) ?
From the documentation:
"The only package required to use ModPipe is Modeller, but BLAST is also very useful for some of the fold assignment methods. Other packages are only rarely used by ModPipe, so it is probably not necessary to install them."
> Should the NRSEQDB tag point to uniprot90.hdf5 file obtained by > compressing the Uniprot90 FASTA file, or is there some more compact > and smaller representation for the non-redundant sequence database > which could be used to get the results faster?
NRSEQDB should point to a binary uniprot90, or other non redundant sequence database. Binary files are not compressed, but they are read into memory faster than PIR or FASTA format. However, the rate limiting step is the dynamic programming itself, which scales as the number of sequences in the database. The only way to make that faster is to use a smaller database, but that way you lose possible hits, of course. Alternatively you can use PSI-BLAST for your template search. Since BLAST uses approximate dynamic programming, it is much faster than Modeller's rigorous dynamic programming (but is less sensitive).
> 3) This question is partially in connection with question 2). I see > that the NRSEQDB and some other databases given in modpipe.conf tags > during execution are copied in local tmp/sequence_id/ directory for > every protein sequence separately. In the case of request for large > number of proteins to be processes this may quickly eat up disk space > (unprot90 binary file is larger than 2GB).
Yes. Typically ModPipe is run on a compute cluster, so most of the runtime files are copied to local storage to avoid overloading the network.
Ben