1) Some people are getting weird results from gperftools/pperf (and I'm
happy to help unweird them :). I attach profiling.pdf, a sample output of
gperftools so you know how it should look like when it works, in this case
for a python IMP script that spends most of its time in the C++ section of
the code. You will see 60% of the running time is spent in
BrownianDynamics::do_step, of these 45% in score evaluation, of which 16%
are spent preparing for score evaluation, 22% in the score evaluation
itself, and 6% in post-evaluation tasks. Another 33% are spent in updating
optimizer states (in this particular example, which is optimizer state
intensive).
2) If most of your running time is not in the C++ parts of the code, you
could profile the python code itself. There are many tools available. One
easy option to profile a python script at the function level is cProfile
(documentation https://docs.python.org/2/library/profile.html). If you just
want to quickly start, here is a sample file that uses cProfile. There are
other profilers that work for line-by-line profiling of python scripts, or
for parallel runs
import cProfile
def ho():
for i in range(10):
pass
def hi():
for i in range(1000):
ho()
def he():
for i in range(10000):
hi()
if __name__ == '__main__':
cProfile.run('he()')
A general note - my experience is with profiling C++ heavy runs (even if
bound by python) that are serial (working on one CPU each). There are good
solutions for profiling the python script itself (the time spent performing
actual python operations, not just calling IMP binary libraries). I can
elaborate more if relevant (but test what I sent first!). Also, profiling
parallel programs might work but I’m not sure - start with profiling serial
runs before trying parallel, if possible.
On Tue, Jun 19, 2018 at 12:52 PM Barak Raveh <barak.raveh(a)gmail.com> wrote:
> This is for binary files or python files on unix machines, examples are
> given in bash.
> I haven't profiled Mac files though I think Charles has. Only profile
> files in fast mode - no point in profiling debug builds!
>
>
> *Profiling python IMP files:*
>
> 1) Use the CPU_PROFILE flag to specify an output .prof file (on some
> systems you might also need to add LD_PRELOAD=/usr/lib64/libprofiler.so -
> probably not):
>
> $ CPUPROFILE=./barak.prof /home/barak/imp_git/fast/setup_environment.sh
> python expensive_test_statistics_from_simulation_data.py &> LOG
>
> 2) Use "pprof --pdf <path-to-python-binary> <path-to-.prof-file>" to print
> to PDF format - perhaps the process id will be appended to the output
> profile file. Use "--gv", "--txt" or some of the other pprof options if you
> like.
>
> pprof --pdf /usr/bin/python barak.prof_31784 > 31784.pdf
>
>
> *Profiling binary IMP files:*
>
> 1) Run your binary IMP program as you usually would, but with the
> --cpu_profile flag.
> (this is a generic IMP flag, which will create a .prof file with
> profiling information. The default .prof file will be created in the
> current folder, or edit CPU_PROFILE to change the path of the output .prof
> file).
>
> 2) Use "pprof --pdf <path-to-binary-IMP-file> <path-to-.prof-file>" to
> print to PDF format. Use "--gv", "--txt" or some of the other pprof
> options if you like.
>
>
> *Comments:*
>
> 1) Documentation of gperftools https://github.com/gperftools/gperftools
>
> 2) See also possibly partially updated IMP page on profiling
> https://github.com/salilab/imp/blob/4941f4e11403bac687a64b273f80f1ccc70b352…
>
> 3) Try "pprof --help" for some interesting options
>
>
> --
--
Barak
---------- Forwarded message ----------
From: Barak Raveh <barak.raveh(a)gmail.com>
Date: Tue, Jun 19, 2018 at 12:52 PM
Subject: profiling IMP files on Unix
To: Sali Lab <salilab(a)salilab.org>
This is for binary files or python files on unix machines, examples are
given in bash.
I haven't profiled Mac files though I think Charles has. Only profile files
in fast mode - no point in profiling debug builds!
*Profiling python IMP files:*
1) Use the CPU_PROFILE flag to specify an output .prof file (on some
systems you might also need to add LD_PRELOAD=/usr/lib64/libprofiler.so -
probably not):
$ CPUPROFILE=./barak.prof /home/barak/imp_git/fast/setup_environment.sh
python expensive_test_statistics_from_simulation_data.py &> LOG
2) Use "pprof --pdf <path-to-python-binary> <path-to-.prof-file>" to print
to PDF format - perhaps the process id will be appended to the output
profile file. Use "--gv", "--txt" or some of the other pprof options if you
like.
pprof --pdf /usr/bin/python barak.prof_31784 > 31784.pdf
*Profiling binary IMP files:*
1) Run your binary IMP program as you usually would, but with the
--cpu_profile flag.
(this is a generic IMP flag, which will create a .prof file with
profiling information. The default .prof file will be created in the
current folder, or edit CPU_PROFILE to change the path of the output .prof
file).
2) Use "pprof --pdf <path-to-binary-IMP-file> <path-to-.prof-file>" to
print to PDF format. Use "--gv", "--txt" or some of the other pprof options
if you like.
*Comments:*
1) Documentation of gperftools https://github.com/gperftools/gperftools
2) See also possibly partially updated IMP page on profiling
https://github.com/salilab/imp/blob/4941f4e11403bac687a64b273f80f1
ccc70b3522/doc/manual/profiling.md
3) Try "pprof --help" for some interesting options
--
Barak
Hi all,
We're going to try and meet in the first half of August to go over the IMP
profiling results.
Please fill in the doodle poll!
https://doodle.com/poll/zwuggidx2fxegpni
Best,
-Daniel
--
Daniel Saltzberg
Post-doctoral Scholar
University of California at San Francisco
Lab of Andrej Sali (www.salilab.org)
T: 415.514.4258
*Mailing Address:*
UCSF MC 2552,
Mission Bay, Byers Hall
1700 4th Street, Suite 503B
San Francisco, CA 94158-2330
saltzberg(a)salilab.org
ds229(a)bu.edu
Hi all,
Just a reminder that we will have a meeting to discuss strategies for
making IMP more efficient at 11AM Pacific tomorrow. For those in the
office, we will be in MH 1405.
You can join virtually using the salilab Zoom link; https://ucsf.zoom.us/my/
salilab
*Agenda:*
1) Introduction. Scope of these meetings.
2) IMP efficiency projects. - Discuss status of and plans for completing:
a) IncrementalScoringFunction (Issue 959
<https://github.com/salilab/imp/issues/959>)
b) Parallelization of Replicas
c) GPU accelleration
3) Additional projects to consider.
4) Schedule next/recurring meetings.
Cheers,
-Daniel
On Sun, Jun 17, 2018 at 5:27 PM, Chemmama, Ilan <Ilan.Chemmama(a)ucsf.edu>
wrote:
> Next Tuesday at 11AM PST works for me as well.
> Best,
> Ilan
>
> On Jun 15, 2018, at 10:16 AM, Ignacia Echeverria <iecheverria(a)gmail.com>
> wrote:
>
> Next Tuesday works for me.
> Best,
> Ignacia
>
> On Thu, Jun 14, 2018 at 9:25 AM Daniel Saltzberg <ds229(a)bu.edu> wrote:
>
>> I like the idea of involving the entire IMP community, however, my
>> initial goals for this meeting were not quite as large as Barak suggested!
>> In that case, let's push the meeting off until next week so that we give
>> people enough time to schedule and myself enough time to prepare a more
>> focused agenda.
>>
>> I'll suggest next *Tuesday, June 19th at 11AM Pacific Time.* The meeting
>> will go no longer than an hour and a half.
>>
>> For those new to this email thread, we are interested in identifying
>> specific areas in IMP where efficiency and functionality can be
>> significantly improved with a moderate amount of effort and task these
>> projects to individuals or small groups. Some issues we discussed at the
>> retreat are:
>>
>> * Scoring efficiency - Make IncrementalScoringFunction work with current
>> restraints (EM, EV)
>> * Parallelization of individual replicas
>> * Parallelizing propagation of moves within coarse grained representation
>> of rigid bodies.
>> * Identification of GPU-friendly computations
>>
>> The purpose of this meeting and subsequent ones will be to discuss the
>> root causes of these issues, discuss general strategies for solving them,
>> document them/improve the documentation in github issues and assign people
>> to work on them outside of the meeting.
>>
>> Cheers,
>> -Daniel
>>
>> On Tue, Jun 12, 2018 at 11:33 PM, Dina Schneidman <duhovka(a)gmail.com>
>> wrote:
>>
>>> agree with Barak. morning is best
>>>
>>> On Jun 12, 2018 11:52 PM, "Barak Raveh" <barak.raveh(a)gmail.com> wrote:
>>>
>>> I think it's great if we can get all IMP developers including in Europe
>>> and Asia on board, at least for the first meeting. IMP development could
>>> benefit a lot from collaboration among the various IMP labs, and this could
>>> also support future collaboration among IMP labs.
>>>
>>> The timezone is an issue - my recollection is that we have developers in
>>> Spain (+8 hours), France (+9 hours), Israel (+10 hours), possibly in
>>> Germany (+9 hours - Jan Kosinski at EMBL, Heidelberg may be interested). If
>>> I didn't forget anyone (Ben?) - a morning meeting here (say 10 or 11 am)
>>> could allow them to join in the European/ME evening.
>>>
>>>
>>> On Mon, Jun 11, 2018 at 10:36 AM, Ben Webb <ben(a)salilab.org> wrote:
>>>
>>>> On 6/8/18 4:44 PM, Daniel Saltzberg wrote:
>>>>
>>>>> We had a good discussion at the retreat about how improving the
>>>>> reliability and efficiency in IMP is critical for our current and future
>>>>> goals in structural modeling. To do this, we proposed a periodic meeting to
>>>>> discuss problems/inefficiencies in IMP, identify the root causes and
>>>>> collectively design the best ways to fix them.
>>>>>
>>>>
>>>> The best place for such discussion is the existing IMP developers'
>>>> mailing list (cc'd):
>>>> https://salilab.org/mailman/listinfo/imp-dev
>>>>
>>>> There are several people outside of the lab who are IMP developers and
>>>> so may be interested in joining these meetings. They're not on the lab
>>>> mailing list but are (or can be) on imp-dev.
>>>>
>>>> I suggest we have our first meeting next week...say Thursday 2PM? (or
>>>>> Ben, whatever day you plan to be in). We can decide on a recurring time
>>>>> then.
>>>>>
>>>>
>>>> If you want to have physical meetings as well, that's fine, although
>>>> Fridays are better for me. 2pm's probably not a great time if anybody from
>>>> Europe or Asia will be joining though.
>>>>
>>>> Ben
>>>> --
>>>> ben(a)salilab.org https://salilab.org/~ben/
>>>> "It is a capital mistake to theorize before one has data."
>>>> - Sir Arthur Conan Doyle
>>>>
>>>
>>>
>>>
>>> --
>>> Barak
>>> _______________________________________________
>>> IMP-dev mailing list
>>> IMP-dev(a)salilab.org
>>> https://salilab.org/mailman/listinfo/imp-dev
>>>
>>>
>>>
>>
>>
>> --
>>
>>
>> Daniel Saltzberg
>> Post-doctoral Scholar
>> University of California at San Francisco
>> Lab of Andrej Sali (www.salilab.org)
>>
>> T: 415.514.4258
>>
>> *Mailing Address:*
>> UCSF MC 2552,
>> Mission Bay, Byers Hall
>> 1700 4th Street, Suite 503B
>> <https://maps.google.com/?q=1700+4th+Street,+Suite+503B+%0D%0ASan+Francisco,…>
>> San Francisco, CA 94158-2330
>>
>> saltzberg(a)salilab.org
>> ds229(a)bu.edu
>>
>
>
> --
> ---------------------------------------
> Ignacia Echeverria
> Postdoctoral Scholar
> Department of Bioengineering and Therapeutic Sciences
> University of California, San Francisco
> http://salilab.org/~ignacia
> ----------------------------------------
>
>
>
--
Daniel Saltzberg
Post-doctoral Scholar
University of California at San Francisco
Lab of Andrej Sali (www.salilab.org)
T: 415.514.4258
*Mailing Address:*
UCSF MC 2552,
Mission Bay, Byers Hall
1700 4th Street, Suite 503B
San Francisco, CA 94158-2330
saltzberg(a)salilab.org
ds229(a)bu.edu
-------- Forwarded Message --------
Subject: Re: [IMP-dev] IMP Efficiency Comm
Date: Thu, 14 Jun 2018 09:24:54 -0700
From: Daniel Saltzberg <ds229(a)bu.edu>
To: Dina Schneidman <duhovka(a)gmail.com>
CC: Barak Raveh <barak.raveh(a)gmail.com>, List for IMP development
<imp-dev(a)salilab.org>, Ben Webb <ben(a)salilab.org>, SaliLab Mail
<salilab(a)salilab.org>
I like the idea of involving the entire IMP community, however, my
initial goals for this meeting were not quite as large as Barak
suggested! In that case, let's push the meeting off until next week so
that we give people enough time to schedule and myself enough time to
prepare a more focused agenda.
I'll suggest next *Tuesday, June 19th at 11AM Pacific Time.* The meeting
will go no longer than an hour and a half.
For those new to this email thread, we are interested in identifying
specific areas in IMP where efficiency and functionality can be
significantly improved with a moderate amount of effort and task these
projects to individuals or small groups. Some issues we discussed at
the retreat are:
* Scoring efficiency - Make IncrementalScoringFunction work with current
restraints (EM, EV)
* Parallelization of individual replicas
* Parallelizing propagation of moves within coarse grained
representation of rigid bodies.
* Identification of GPU-friendly computations
The purpose of this meeting and subsequent ones will be to discuss the
root causes of these issues, discuss general strategies for solving
them, document them/improve the documentation in github issues and
assign people to work on them outside of the meeting.
Cheers,
-Daniel
On Tue, Jun 12, 2018 at 11:33 PM, Dina Schneidman <duhovka(a)gmail.com
<mailto:duhovka@gmail.com>> wrote:
agree with Barak. morning is best
On Jun 12, 2018 11:52 PM, "Barak Raveh" <barak.raveh(a)gmail.com
<mailto:barak.raveh@gmail.com>> wrote:
I think it's great if we can get all IMP developers including in
Europe and Asia on board, at least for the first meeting. IMP
development could benefit a lot from collaboration among the
various IMP labs, and this could also support future
collaboration among IMP labs.
The timezone is an issue - my recollection is that we have
developers in Spain (+8 hours), France (+9 hours), Israel (+10
hours), possibly in Germany (+9 hours - Jan Kosinski at EMBL,
Heidelberg may be interested). If I didn't forget anyone (Ben?)
- a morning meeting here (say 10 or 11 am) could allow them to
join in the European/ME evening.
On Mon, Jun 11, 2018 at 10:36 AM, Ben Webb <ben(a)salilab.org
<mailto:ben@salilab.org>> wrote:
On 6/8/18 4:44 PM, Daniel Saltzberg wrote:
We had a good discussion at the retreat about how
improving the reliability and efficiency in IMP is
critical for our current and future goals in structural
modeling. To do this, we proposed a periodic meeting to
discuss problems/inefficiencies in IMP, identify the
root causes and collectively design the best ways to fix
them.
The best place for such discussion is the existing IMP
developers' mailing list (cc'd):
https://salilab.org/mailman/listinfo/imp-dev
<https://salilab.org/mailman/listinfo/imp-dev>
There are several people outside of the lab who are IMP
developers and so may be interested in joining these
meetings. They're not on the lab mailing list but are (or
can be) on imp-dev.
I suggest we have our first meeting next week...say
Thursday 2PM? (or Ben, whatever day you plan to be in).
We can decide on a recurring time then.
If you want to have physical meetings as well, that's fine,
although Fridays are better for me. 2pm's probably not a
great time if anybody from Europe or Asia will be joining
though.
Ben
-- ben(a)salilab.org <mailto:ben@salilab.org>
https://salilab.org/~ben/
"It is a capital mistake to theorize before one has data."
- Sir Arthur Conan Doyle
-- Barak
_______________________________________________
IMP-dev mailing list
IMP-dev(a)salilab.org <mailto:IMP-dev@salilab.org>
https://salilab.org/mailman/listinfo/imp-dev
<https://salilab.org/mailman/listinfo/imp-dev>
--
Daniel Saltzberg
Post-doctoral Scholar
University of California at San Francisco
Lab of Andrej Sali (www.salilab.org <http://www.salilab.org>)
T: 415.514.4258
_
_
_/Mailing Address:/_
UCSF MC 2552,
Mission Bay, Byers Hall
1700 4th Street, Suite 503B
San Francisco, CA 94158-2330
saltzberg(a)salilab.org <mailto:saltzberg@salilab.org>
ds229(a)bu.edu <mailto:ds229@bu.edu>
Technically, this was Ben's (awesome) idea to include everyone :)
Just to gauge the level of interest of IMP developers outside of UCSF - who
wants to join besides Dina?
On Thu, Jun 14, 2018 at 9:24 AM, Daniel Saltzberg <ds229(a)bu.edu> wrote:
> I like the idea of involving the entire IMP community, however, my initial
> goals for this meeting were not quite as large as Barak suggested! In that
> case, let's push the meeting off until next week so that we give people
> enough time to schedule and myself enough time to prepare a more focused
> agenda.
>
> I'll suggest next *Tuesday, June 19th at 11AM Pacific Time.* The meeting
> will go no longer than an hour and a half.
>
> For those new to this email thread, we are interested in identifying
> specific areas in IMP where efficiency and functionality can be
> significantly improved with a moderate amount of effort and task these
> projects to individuals or small groups. Some issues we discussed at the
> retreat are:
>
> * Scoring efficiency - Make IncrementalScoringFunction work with current
> restraints (EM, EV)
> * Parallelization of individual replicas
> * Parallelizing propagation of moves within coarse grained representation
> of rigid bodies.
> * Identification of GPU-friendly computations
>
> The purpose of this meeting and subsequent ones will be to discuss the
> root causes of these issues, discuss general strategies for solving them,
> document them/improve the documentation in github issues and assign people
> to work on them outside of the meeting.
>
> Cheers,
> -Daniel
>
> On Tue, Jun 12, 2018 at 11:33 PM, Dina Schneidman <duhovka(a)gmail.com>
> wrote:
>
>> agree with Barak. morning is best
>>
>> On Jun 12, 2018 11:52 PM, "Barak Raveh" <barak.raveh(a)gmail.com> wrote:
>>
>> I think it's great if we can get all IMP developers including in Europe
>> and Asia on board, at least for the first meeting. IMP development could
>> benefit a lot from collaboration among the various IMP labs, and this could
>> also support future collaboration among IMP labs.
>>
>> The timezone is an issue - my recollection is that we have developers in
>> Spain (+8 hours), France (+9 hours), Israel (+10 hours), possibly in
>> Germany (+9 hours - Jan Kosinski at EMBL, Heidelberg may be interested). If
>> I didn't forget anyone (Ben?) - a morning meeting here (say 10 or 11 am)
>> could allow them to join in the European/ME evening.
>>
>>
>> On Mon, Jun 11, 2018 at 10:36 AM, Ben Webb <ben(a)salilab.org> wrote:
>>
>>> On 6/8/18 4:44 PM, Daniel Saltzberg wrote:
>>>
>>>> We had a good discussion at the retreat about how improving the
>>>> reliability and efficiency in IMP is critical for our current and future
>>>> goals in structural modeling. To do this, we proposed a periodic meeting to
>>>> discuss problems/inefficiencies in IMP, identify the root causes and
>>>> collectively design the best ways to fix them.
>>>>
>>>
>>> The best place for such discussion is the existing IMP developers'
>>> mailing list (cc'd):
>>> https://salilab.org/mailman/listinfo/imp-dev
>>>
>>> There are several people outside of the lab who are IMP developers and
>>> so may be interested in joining these meetings. They're not on the lab
>>> mailing list but are (or can be) on imp-dev.
>>>
>>> I suggest we have our first meeting next week...say Thursday 2PM? (or
>>>> Ben, whatever day you plan to be in). We can decide on a recurring time
>>>> then.
>>>>
>>>
>>> If you want to have physical meetings as well, that's fine, although
>>> Fridays are better for me. 2pm's probably not a great time if anybody from
>>> Europe or Asia will be joining though.
>>>
>>> Ben
>>> --
>>> ben(a)salilab.org https://salilab.org/~ben/
>>> "It is a capital mistake to theorize before one has data."
>>> - Sir Arthur Conan Doyle
>>>
>>
>>
>>
>> --
>> Barak
>> _______________________________________________
>> IMP-dev mailing list
>> IMP-dev(a)salilab.org
>> https://salilab.org/mailman/listinfo/imp-dev
>>
>>
>>
>
>
> --
>
>
> Daniel Saltzberg
> Post-doctoral Scholar
> University of California at San Francisco
> Lab of Andrej Sali (www.salilab.org)
>
> T: 415.514.4258
>
> *Mailing Address:*
> UCSF MC 2552,
> Mission Bay, Byers Hall
> 1700 4th Street, Suite 503B
> <https://maps.google.com/?q=1700+4th+Street,+Suite+503B+San+Francisco,+CA+94…>
> San Francisco, CA 94158
> <https://maps.google.com/?q=1700+4th+Street,+Suite+503B+San+Francisco,+CA+94…>
> -2330
>
> saltzberg(a)salilab.org
> ds229(a)bu.edu
>
--
Barak
On 6/8/18 4:44 PM, Daniel Saltzberg wrote:
> We had a good discussion at the retreat about how improving the
> reliability and efficiency in IMP is critical for our current and future
> goals in structural modeling. To do this, we proposed a periodic meeting
> to discuss problems/inefficiencies in IMP, identify the root causes and
> collectively design the best ways to fix them.
The best place for such discussion is the existing IMP developers'
mailing list (cc'd):
https://salilab.org/mailman/listinfo/imp-dev
There are several people outside of the lab who are IMP developers and
so may be interested in joining these meetings. They're not on the lab
mailing list but are (or can be) on imp-dev.
> I suggest we have our first meeting next week...say Thursday 2PM? (or
> Ben, whatever day you plan to be in). We can decide on a recurring time
> then.
If you want to have physical meetings as well, that's fine, although
Fridays are better for me. 2pm's probably not a great time if anybody
from Europe or Asia will be joining though.
Ben
--
ben(a)salilab.org https://salilab.org/~ben/
"It is a capital mistake to theorize before one has data."
- Sir Arthur Conan Doyle
We're a little overdue for a new IMP stable release. Since there have
been several applications of IMP published since the last stable release
that currently only work with the nightly builds (e.g. NPC) and IMP 2.8
has some compilation issues with the latest CGAL and gcc, I'll be making
a new release shortly - probably branching within a week or two and
releasing at the end of this month. So now is a good time for any
polishing, bug fixing, documentation fixes etc. (and a really bad time
for any sweeping changes!)
Ben
--
ben(a)salilab.org https://salilab.org/~ben/
"It is a capital mistake to theorize before one has data."
- Sir Arthur Conan Doyle