[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ModPipe-users] Use of external programs and NRSQEDB tag in modPipe



On 7/12/12 8:25 AM, Dimitrije Jevremovic wrote:
1) When I downloaded and installed modPipe, I connected it according
to instructions to MODELLER and some other external tools such as
BLAST. First question is about external programs. Can all ten external
programs mentioned at the end of the introduction page at
http://salilab.org/modpipe/doc/intro.html be installed or just the
four external programs described with instructions at the page
http://salilab.org/modpipe/doc/install-ext.html (MODELLER, BLAST,
CD-HIT and PROCHECK) ?

From the documentation:

"The only package required to use ModPipe is Modeller, but BLAST is also very useful for some of the fold assignment methods. Other packages are only rarely used by ModPipe, so it is probably not necessary to install them."

Should the NRSEQDB tag point to uniprot90.hdf5 file obtained by
compressing the Uniprot90 FASTA file, or is there some more compact
and smaller representation for the non-redundant sequence database
which could be used to get the results faster?

NRSEQDB should point to a binary uniprot90, or other non redundant sequence database. Binary files are not compressed, but they are read into memory faster than PIR or FASTA format. However, the rate limiting step is the dynamic programming itself, which scales as the number of sequences in the database. The only way to make that faster is to use a smaller database, but that way you lose possible hits, of course. Alternatively you can use PSI-BLAST for your template search. Since BLAST uses approximate dynamic programming, it is much faster than Modeller's rigorous dynamic programming (but is less sensitive).

3) This question is partially in connection with question 2). I see
that the NRSEQDB and some other databases given in modpipe.conf tags
during execution are copied in local tmp/sequence_id/ directory for
every protein sequence separately. In the case of request for large
number of proteins to be processes this may quickly eat up disk space
(unprot90 binary file is larger than 2GB).

Yes. Typically ModPipe is run on a compute cluster, so most of the runtime files are copied to local storage to avoid overloading the network.

	Ben
--
                      http://salilab.org/~ben/
"It is a capital mistake to theorize before one has data."
	- Sir Arthur Conan Doyle