Trying to use MultiFit but I have problems with installing/running IMP
Dear IMP users,
I followed the directions on
http://salilab.org/imp/1.0/doc/html/installation.html
to install IMP (I even installed the optional dependencies/Prerequisites installed.
I got MultiFit from
http://modbase.compbio.ucsf.edu/multifit/download.cgi
I am trying to go through the tutorial for MultiFit on
http://www.integrativemodeling.org/1.0/tutorial/multifit.html
but when I enter the first command
/opt/multifit/utils/run_anchor_points_detection.py assembly.input 700
It returns
Traceback (most recent call last): File "/opt/multifit/utils/run_anchor_points_detection.py", line 7, in <module> import IMP ImportError: No module named IMP
I looked at
http://salilab.org/imp/archives/imp-users/msg00075.html
but trying the suggestions there did not solve this problem.
I would appreciate any suggestions. Though I can (barely) get by, I am not by any means a Linux guru, nor I am a coder, so please keep this in mind in your replies. I am using Ubuntu 10.04 LTS - the Lucid Lynx with all updates to date installed.
Thank you.
Gökhan
On 7/8/11 7:24 AM, Tolun, Gökhan wrote: ... > but when I enter the first command > > /opt/multifit/utils/run_anchor_points_detection.py assembly.input 700 > > It returns > ImportError: No module named IMP
Once IMP is installed, you should be able to run a Python interpreter ("python" from a command line) and type "import IMP". If this doesn't work, then the Python modules must have been installed somewhere that isn't in the Python path.
> I looked at > http://salilab.org/imp/archives/imp-users/msg00075.html > > but trying the suggestions there did not solve this problem.
What did you try? You have two options: either you change the Python path to match the location the IMP Python extensions were installed in (set the PYTHONPATH environment variable) or you change the IMP installation location (set the pythondir scons variable) so that it puts Python extensions in a standard location ("import sys; print sys.path" from a Python interpreter).
> I am using Ubuntu 10.04 LTS - the Lucid Lynx with > all updates to date installed.
Note that MultiFit is currently available in binary form only, and only supported in combination with the binary downloads of IMP. So it may not work on non-RedHat/non-Mac systems.
Ben
> Note that MultiFit is currently available in binary form only, and only > supported in combination with the binary downloads of IMP. So it may not > work on non-RedHat/non-Mac systems.
I just installed and run MultiFit on MAC OSX. I then went through the MultiFit tutorial provided at : http://www.integrativemodeling.org/1.0/tutorial/multifit.html, and here am I with (the usual) heap of comments and question. Hope it can help improve the thing and help me understand things in the process.
A.] Concerning input and output files
a.) As a general comment, I think a summary table with all input (output files) and a short description would help the comprehension
b.) assembly.jt file is not documented. I understand it is an input file that contains the junction tree as described in the paper "Lasker K, Topf M, Sali A, Wolfson HJ. Inferential optimization for simultaneous fitting of multiple components into a CryoEM map of their assembly. J. Mol. Biol. 2009;388(1):180-194." I guess this file has to be hand-forged by the user after the regions definitions in the 1tyq_20.fine.gmm.pdb file produced at step 2. The indices in the node description refer to the ordering in the above mentioned file, and the indices in the edge list refer to the index of a line in the nodes list.
c.) Concerning the final output (out of run_multifit) ARP3,0|ARP2,14|ARC1,3|ARC2,24|ARC3,19|ARC4,11|ARC5,13|(17.5593729019)(rmsd:29.2637996674)(conf.0.pdb) If I am correct, subunits are provided in a pipe-separated list, sorted according to the region index to which they have been attributed in that particular configuration (here, ARP3 is in region 0, ARP2 in region 2, etc…) The integer appearing after the subunit name refer to the index of the solution for this subunit as it was fitted alone in the (finer simplified representation of the) EM map, in step 3.
B.] Small "issues" in the process
a) I don't know if it has something to do with my personal installation, but the first scoring script complained he could not write the files and I had to manually mkdir the "scoring" subdirectory.
b) all conf-XXX.pdb files are dumped in the root directory which is quite messy, maybe it would be better if these would be dumped in the results subdir
C.] Questions
a) Concerning the definition of regions in the map : If I understand well, regions are defined at step 2, and output as fake Ca atoms in the coarse model of 1tyq_20.fine.gmm.pdb file. It is the user responsibility to infer the regions connectivity out of this coarse representation and the input map; then to create the junction tree. Am I correct ? b) Interactomics data : I have the feeling no interaction data are considered in the process (I mean informations such as "ARC2 in known to interact with ARP2"). Is that true ?
c) I don't understand what run_multifit() exactly does. More precisely : 1. it appears a preliminary filter is performed on configurations. By configuration I mean the attribution of each subunit to one and only one region (I think it corresponds to a mapping, in the code). If this is the case, I don't think it is performed based on interactomics data, so what is it based upon ? 2. I have the feeling one and only one solution is output (the best score) per retained configuration. Am I right ?
d) Am I correct to say the cross correlation computations are only used in scoring (step 3), and not in pre fitting of subunits (step 2) ? Hence, if I am correct, the fft based cross correlation approach has been replaced by the neural/gmm approach for that particular step ?
Thanks for any answers you can provide, as well as for the incredible job you are doing.
--Ben
Ben - First the version you have installed is quite old (from Jan 2010). The current version is not ready for installation yet. For symmetric complexes the MultiFit webserver is functional and for non-symmetric it should be functional soon. We will integrate your valuable comments in the new version. See below for answers to your questions. best regards, Keren. On Jul 12, 2011, at 6:23 AM, Benjamin SCHWARZ wrote:
>> Note that MultiFit is currently available in binary form only, and only >> supported in combination with the binary downloads of IMP. So it may not >> work on non-RedHat/non-Mac systems. > > > I just installed and run MultiFit on MAC OSX. > I then went through the MultiFit tutorial provided at : http://www.integrativemodeling.org/1.0/tutorial/multifit.html, and here am I with (the usual) heap of comments and question. Hope it can help improve the thing and help me understand things in the process. > > A.] Concerning input and output files > > a.) As a general comment, I think a summary table with all input (output files) and a short description would help the comprehension > > b.) assembly.jt file is not documented. > I understand it is an input file that contains the junction tree as described in the paper "Lasker K, Topf M, Sali A, Wolfson HJ. Inferential optimization for simultaneous fitting of multiple components into a CryoEM map of their assembly. J. Mol. Biol. 2009;388(1):180-194." > I guess this file has to be hand-forged by the user after the regions definitions in the 1tyq_20.fine.gmm.pdb file produced at step 2. > The indices in the node description refer to the ordering in the above mentioned file, and the indices in the edge list refer to the index of a line in the nodes list. > > c.) Concerning the final output (out of run_multifit) > ARP3,0|ARP2,14|ARC1,3|ARC2,24|ARC3,19|ARC4,11|ARC5,13|(17.5593729019)(rmsd:29.2637996674)(conf.0.pdb) > If I am correct, subunits are provided in a pipe-separated list, sorted according to the region index to which they have been attributed in that particular configuration (here, ARP3 is in region 0, ARP2 in region 2, etc…) > The integer appearing after the subunit name refer to the index of the solution for this subunit as it was fitted alone in the (finer simplified representation of the) EM map, in step 3. > > > B.] Small "issues" in the process > > a) I don't know if it has something to do with my personal installation, but the first scoring script complained he could not write the files and I had to manually mkdir the "scoring" subdirectory. > > b) all conf-XXX.pdb files are dumped in the root directory which is quite messy, maybe it would be better if these would be dumped in the results subdir > > > C.] Questions > > a) Concerning the definition of regions in the map : > If I understand well, regions are defined at step 2, and output as fake Ca atoms in the coarse model of 1tyq_20.fine.gmm.pdb file. It is the user responsibility to infer the regions connectivity out of this coarse representation and the input map; then to create the junction tree. Am I correct ? The user does not need to infer connected regions. This is done by the program. > > b) Interactomics data : I have the feeling no interaction data are considered in the process (I mean informations such as "ARC2 in known to interact with ARP2"). Is that true ? At least in the new version all interaction data is definitely being considered. The old version was before interaction data was part of MultiFit. > > c) I don't understand what run_multifit() exactly does. More precisely : > 1. it appears a preliminary filter is performed on configurations. By configuration I mean the attribution of each subunit to one and only one region (I think it corresponds to a mapping, in the code). If this is the case, I don't think it is performed based on interactomics data, so what is it based upon ? > 2. I have the feeling one and only one solution is output (the best score) per retained configuration. Am I right ? run_multifit tests all possible configurations according to the sampled ones. > > d) Am I correct to say the cross correlation computations are only used in scoring (step 3), and not in pre fitting of subunits (step 2) ? Hence, if I am correct, the fft based cross correlation approach has been replaced by the neural/gmm approach for that particular step ? FFT based fitting was a new addition following the point based matching used in the version we use. We allow for both options depending on the complexity and resolution of the complex you have.
> > > Thanks for any answers you can provide, as well as for the incredible job you are doing. > > --Ben > _______________________________________________ > IMP-users mailing list > IMP-users@salilab.org > https://salilab.org/mailman/listinfo/imp-users
Hi Keren,
and many thanks for such a fast answer.
> The user does not need to infer connected regions. This is done by the program. Oops… Seems like I missed something here, I'll dig into it. And great if the program does it on his own.
>> b) Interactomics data : I have the feeling no interaction data are considered in the process (I mean informations such as "ARC2 in known to interact with ARP2"). Is that true ? > At least in the new version all interaction data is definitely being considered. The old version was before interaction data was part of MultiFit. OK, so It wasn't something I missed here.
>> c) I don't understand what run_multifit() exactly does. More precisely : >> 1. it appears a preliminary filter is performed on configurations. By configuration I mean the attribution of each subunit to one and only one region (I think it corresponds to a mapping, in the code). If this is the case, I don't think it is performed based on interactomics data, so what is it based upon ? >> 2. I have the feeling one and only one solution is output (the best score) per retained configuration. Am I right ? > run_multifit tests all possible configurations according to the sampled ones. I am not sure I understand that well… So, just for a confirmation : in the tutorial there are 7 subunits, hence 7!, that is ~5000 possible configurations. Do you mean all 5000 configurations are tested and the solution with the best score extracted ? ...Unless a first test is achieved to check for each subunit what regions are populated or deserted by the 30 pre-fitted solutions ?
>> d) Am I correct to say the cross correlation computations are only used in scoring (step 3), and not in pre fitting of subunits (step 2) ? Hence, if I am correct, the fft based cross correlation approach has been replaced by the neural/gmm approach for that particular step ? > FFT based fitting was a new addition following the point based matching used in the version we use. We allow for both options depending on the complexity and resolution of the complex you have. Just great !
--Ben
below. On Jul 12, 2011, at 12:21 PM, Benjamin SCHWARZ wrote:
> Hi Keren, > > and many thanks for such a fast answer. > >> The user does not need to infer connected regions. This is done by the program. > Oops… Seems like I missed something here, I'll dig into it. > And great if the program does it on his own. > >>> b) Interactomics data : I have the feeling no interaction data are considered in the process (I mean informations such as "ARC2 in known to interact with ARP2"). Is that true ? >> At least in the new version all interaction data is definitely being considered. The old version was before interaction data was part of MultiFit. > OK, so It wasn't something I missed here. > >>> c) I don't understand what run_multifit() exactly does. More precisely : >>> 1. it appears a preliminary filter is performed on configurations. By configuration I mean the attribution of each subunit to one and only one region (I think it corresponds to a mapping, in the code). If this is the case, I don't think it is performed based on interactomics data, so what is it based upon ? >>> 2. I have the feeling one and only one solution is output (the best score) per retained configuration. Am I right ? >> run_multifit tests all possible configurations according to the sampled ones. > I am not sure I understand that well… So, just for a confirmation : in the tutorial there are 7 subunits, hence 7!, that is ~5000 possible configurations. > Do you mean all 5000 configurations are tested and the solution with the best score extracted ? > ...Unless a first test is achieved to check for each subunit what regions are populated or deserted by the 30 pre-fitted solutions ? Many configurations are removed initially because there are too many overlaps. This is done better in the new versions using the updated domino code. > >>> d) Am I correct to say the cross correlation computations are only used in scoring (step 3), and not in pre fitting of subunits (step 2) ? Hence, if I am correct, the fft based cross correlation approach has been replaced by the neural/gmm approach for that particular step ? >> FFT based fitting was a new addition following the point based matching used in the version we use. We allow for both options depending on the complexity and resolution of the complex you have. > Just great ! > > --Ben > _______________________________________________ > IMP-users mailing list > IMP-users@salilab.org > https://salilab.org/mailman/listinfo/imp-users
participants (4)
-
Ben Webb
-
Benjamin SCHWARZ
-
Keren Lasker
-
Tolun, Gökhan