Greetings,
I have been using modPipe for the past 2 weeks and I have three (two questions + one comment) specific questions to ask:
1) When I downloaded and installed modPipe, I connected it according to instructions to MODELLER and some other external tools such as BLAST. First question is about external programs. Can all ten external programs mentioned at the end of the introduction page at http://salilab.org/modpipe/doc/intro.html be installed or just the four external programs described with instructions at the page http://salilab.org/modpipe/doc/install-ext.html (MODELLER, BLAST, CD-HIT and PROCHECK) ?
2) My introduction to modPipe was through the demo example described at http://salilab.org/modpipe/doc/running.html. This example executed relatively quickly for default given settings in modpipe.conf. However, when I compare the modpipe.conf file with the modpipe.conf file given at http://salilab.org/modpipe/doc/modpipe.conf I see that the content of NRSEQDB tag is different. The NRSEQDB tag in demo/modpipe.conf points to demo/db/testdb.hdf5 (~4.5MB) while the NRSEQDB tag in http://salilab.org/modpipe/doc/modpipe.conf points to binary form of file uniprot90 (which is > 2GB if downloaded from http://salilab.org/modeller/downloads/uniprot90.gz). It is taking awfully long to get the results for a single protein in 1100 or 1000 hits mode if non-redundant sequence database is set to be the entire uniprot90 binary database file. Should the NRSEQDB tag point to uniprot90.hdf5 file obtained by compressing the Uniprot90 FASTA file, or is there some more compact and smaller representation for the non-redundant sequence database which could be used to get the results faster?
3) This question is partially in connection with question 2). I see that the NRSEQDB and some other databases given in modpipe.conf tags during execution are copied in local tmp/sequence_id/ directory for every protein sequence separately. In the case of request for large number of proteins to be processes this may quickly eat up disk space (unprot90 binary file is larger than 2GB).
Other than the above three issues, I think that I have setup modPipe correctly for execution.
Thank you and my best regards, ----------------- Dimitrije Jevremovic intern at Bioinformatics & Computational Biology Department
On 7/12/12 8:25 AM, Dimitrije Jevremovic wrote: > 1) When I downloaded and installed modPipe, I connected it according > to instructions to MODELLER and some other external tools such as > BLAST. First question is about external programs. Can all ten external > programs mentioned at the end of the introduction page at > http://salilab.org/modpipe/doc/intro.html be installed or just the > four external programs described with instructions at the page > http://salilab.org/modpipe/doc/install-ext.html (MODELLER, BLAST, > CD-HIT and PROCHECK) ?
From the documentation:
"The only package required to use ModPipe is Modeller, but BLAST is also very useful for some of the fold assignment methods. Other packages are only rarely used by ModPipe, so it is probably not necessary to install them."
> Should the NRSEQDB tag point to uniprot90.hdf5 file obtained by > compressing the Uniprot90 FASTA file, or is there some more compact > and smaller representation for the non-redundant sequence database > which could be used to get the results faster?
NRSEQDB should point to a binary uniprot90, or other non redundant sequence database. Binary files are not compressed, but they are read into memory faster than PIR or FASTA format. However, the rate limiting step is the dynamic programming itself, which scales as the number of sequences in the database. The only way to make that faster is to use a smaller database, but that way you lose possible hits, of course. Alternatively you can use PSI-BLAST for your template search. Since BLAST uses approximate dynamic programming, it is much faster than Modeller's rigorous dynamic programming (but is less sensitive).
> 3) This question is partially in connection with question 2). I see > that the NRSEQDB and some other databases given in modpipe.conf tags > during execution are copied in local tmp/sequence_id/ directory for > every protein sequence separately. In the case of request for large > number of proteins to be processes this may quickly eat up disk space > (unprot90 binary file is larger than 2GB).
Yes. Typically ModPipe is run on a compute cluster, so most of the runtime files are copied to local storage to avoid overloading the network.
Ben
participants (2)
-
Ben Webb
-
Dimitrije Jevremovic