[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ModPipe-users] Use of external programs and NRSQEDB tag in modPipe



Greetings,

I have been using modPipe for the past 2 weeks and I have three (two
questions + one comment) specific questions to ask:

1) When I downloaded and installed modPipe, I connected it according
to instructions to MODELLER and some other external tools such as
BLAST. First question is about external programs. Can all ten external
programs mentioned at the end of the introduction page at
http://salilab.org/modpipe/doc/intro.html be installed or just the
four external programs described with instructions at the page
http://salilab.org/modpipe/doc/install-ext.html (MODELLER, BLAST,
CD-HIT and PROCHECK) ?

2) My introduction to modPipe was through the demo example described
at http://salilab.org/modpipe/doc/running.html. This example executed
relatively quickly for default given settings in modpipe.conf.
However, when I compare the modpipe.conf file with the modpipe.conf
file given at http://salilab.org/modpipe/doc/modpipe.conf I see that
the content of NRSEQDB tag is different. The NRSEQDB tag in
demo/modpipe.conf points to demo/db/testdb.hdf5 (~4.5MB) while the
NRSEQDB tag in http://salilab.org/modpipe/doc/modpipe.conf points to
binary form of file uniprot90 (which is > 2GB if downloaded from
http://salilab.org/modeller/downloads/uniprot90.gz). It is taking
awfully long to get the results for a single protein in 1100 or 1000
hits mode if non-redundant sequence database is set to be the entire
uniprot90 binary database file.
Should the NRSEQDB tag point to uniprot90.hdf5 file obtained by
compressing the Uniprot90 FASTA file, or is there some more compact
and smaller representation for the non-redundant sequence database
which could be used to get the results faster?

3) This question is partially in connection with question 2). I see
that the NRSEQDB and some other databases given in modpipe.conf tags
during execution are copied in local tmp/sequence_id/ directory for
every protein sequence separately. In the case of request for large
number of proteins to be processes this may quickly eat up disk space
(unprot90 binary file is larger than 2GB).

Other than the above three issues, I think that I have setup modPipe
correctly for execution.

Thank you and my best regards,
-----------------
Dimitrije Jevremovic
intern at Bioinformatics & Computational Biology Department