I have various helper functions that are probably generally useful. I
am ambivalent about what to do with them.
One set is utilities for getting and setting attributes which look like
/**
Set the value of the string attribute name to val, whether or not it
was already there.
*/
bool set_string(ModelData *m,
Particle *p,
const String& name,
std::string val)
and
/**
Get the string attribute name or return default_value if the
…
[View More]particle does not have any such attribute.
*/
const String& get_string(ModelData *m,
Particle *p,
const String& name,
const String& default_value=String())
Obviously there are int and float versions.
Should such things go in to imp proper? On one hand they are
extremely useful. On the other it is increasing the size of the
interface and while it makes it more convenient, it doesn't add any
power.
Next I have some helpers for manipulating hierarchy and bond
particles (getting the number of children, getting specific children
etc). These should go somewhere to establish norms for hierarchies.
[View Less]
ok - my thought on the relationship between particles/restraints and
optimizers.
It might be that I am missing something since I used only I looked at
about half of the IMP classes - not all.
To my understating ( please correct me if I am wrong here) in the
current design:
1- attributes are used for two purposes to store the particles
information (x,y,z, radii, weight, .... ) and to manage the
optimization (is-opt)
and Model data has two responsibilities - hold the particles and
…
[View More]participate in the optimization process.
2 - The restraints holds a state (ie they know in construction time
what is the particle list that they are going to work on)
I think that we should divide the two.
Have ModelData which does not care at all about optimization, its
only responsibility is to maintain a valid set of particles.
We should have an additional class OpimizationData which holds all
optimization header data (is optimizable, should be included in the
scoring ). It can implement optimization logic nonbonded list , it
might be that we'll come up with other optimization logics as
well ... . So the Statistics class for example should be within the
OptimizationData class.
The restraints should be stateless - at each evaluation time they
will get the relevant attribute list and derivative list ( which
might be different from the attribute list) from the optimizationData
For example, for the nonbonded list case:
1 - ModelData holds all the particles.
2 - The user sets its attributes for optimization ( which sets the
OpimizationData class)
3 - At each optimization stage
a - the restraints are given a pointer to the OptimizationData
class - where all the header-optimization information about the
attributes and a reference to them is stored.
b - at the end of the optimization step the OptimizationData
class updates the header information in case some optimization
properties should be changed
4 - when adding or removing particles, the OptimizationData object
( there might be more than one - when dealing with hierarchy for
example - the OptimizationData objects are updated accordingly).
anyway , as I discussed earlier with Ben, I think that we can not
really have a serious discussion without having a few detailed use-
cases ( there are other than the non-bonded list one).
I will write a python unitest for fitting and docking of multiple
components, we should probably have another single protein
refinement one and maybe also ligand docking and then we can assess
solutions more realistically.
[View Less]
Daniel Russel wrote:
> Can we have an include directory (new_imp/include) where and a
> (new_imp/include/IMP) directory for the headers? Now I have to do
> -Inew_imp/src which looks funny and would presumably change for any
> distribution of IMP (since almost all of the headers in src need to be
> installed in an include directory for anyone to write their own
> restraints or use IMP through C++ and their names are too generic to
> just into /usr/include). Thanks.
…
[View More]How does this differ from what we talked about last week? If it's the
same, then the same deal applies: I'm waiting on the scons guys to
resolve some build issues before making any big changes. But the good
news there is that we resolved those today, so I can do this tomorrow.
Does anybody object to Daniel's suggestion?
If it's not the same, then perhaps you can point out the difference,
because it looks the same to me. ;)
P.S. What is this mystery project which needs to include the IMP
headers? In my opinion, it'd be great to have at least some version of
this in the IMP SVN repository with Keren's impEM stuff, because it's
really the applications which define what we do (and need to do) with IMP.
Ben
--
ben(a)salilab.org http://salilab.org/~ben/
"It is a capital mistake to theorize before one has data."
- Sir Arthur Conan Doyle
[View Less]
Daniel Russel wrote:
> I have written some code to use BALL to
> read PDB files and turn them in to particles and a hierarchy (and to
> evaluate MD energy for such a hierarchy). Keren wanted to use this too.
> Do you have a preferred SVN structure/location for this sort of thing?
> Something like trunk/IMP_BALL with include and src directories? And
> pyext too I guess.
Sure, my policy is, roughly speaking: if it doesn't break the core unit
tests, put it in. Better to …
[View More]have it in SVN where we can poke holes in it
than it to live on your laptop for ever.
But this may be an opportune point to discuss the layout of the IMP
repository. Right now it looks basically like:
imp
libsaxs
src
doc
mdt
bin
src
doc
pyext
test
new_imp
bin
IMP
tests
impEM
rsr
doc
pyext
od_dope
src
doc
pyext
test
tnt
python
test
tools
The top-level directories are independent modules which are built
separately but use a shared set of build scripts (in the tools
directory). libsaxs is Frido's SAXS module, which plugs in to Modeller;
mdt uses the Modeller C and Python interfaces; od_dope uses the Modeller
C interface; tnt and new_imp use the Modeller Python interface. new_imp
is essentially what Bret turned in before he left, and which Daniel,
Keren and I are currently working on.
The projects have similar sub-directories: bin for generated binaries;
src for C/C++ source code; doc for documentation; pyext for Python
extensions; python for pure Python code; test for testcases. However,
new_imp uses 'IMP' as its source code directory, and the tests live
under that. new_imp also contains some sub-projects, such as Keren's
impEM interface and Bret's RSR (Restrainer web interface).
As Daniel pointed out, new_imp's naming is a little inconsistent (e.g.
imp/new_imp/IMP to get to the sourcecode) so what about renaming the IMP
subdirectory to src, and the new_imp top-level directory to kernel (or
perhaps base) ? Dependent projects such as impEM and rsr would then
become top-level directories:
imp
libsaxs
mdt
kernel
bin
src
test
doc
pyext
rsr
impEM
od_dope
tnt
tools
This scheme would, however, require you to manually handle dependencies
between the projects (e.g. impEM and rsr would be independent projects)
but that's not a huge hurdle - just run 'scons' in the kernel directory
before running 'scons' in the impEM directory. A slightly more radical
rearrangement could look like
imp
libsaxs
mdt
bin
kernel
src
test
doc
pyext
rsr
impEM
od_dope
tnt
tools
and use a top-level build system which would track dependencies between
projects, staging binaries and libraries for all IMP projects to the bin
directory (but the downside is the targets are a bit more wordy - e.g.
'scons kernel', 'scons kernel-test', 'scons 'impEM', scons 'impEM-test'
rather than just 'scons' and 'scons test' in the kernel and impEM
directories).
Thoughts?
There is, of course, also the issue over whether libsaxs, mdt, od_dope
and tnt belong here or in their own repository. My understanding is that
the Grand Plan is that they'll become part of IMP eventually, which is
why they're there. But if we want to distribute them under a non-free
license for any reason, it might make sense to put them in a separate
repository in the future.
Ben
--
ben(a)salilab.org http://salilab.org/~ben/
"It is a capital mistake to theorize before one has data."
- Sir Arthur Conan Doyle
[View Less]
Daniel Russel wrote:
> We had discussed but never implemented some mechanism for having shared
> state (such as non-bonded lists). It seems to me the right way to do it
> is to have a class State (or some better name) which has a single
> virtual method "void update()" which is called in Model::evaluate before
> the restraints are evaluated. As with Particle, Restraint it stores a
> ModelData pointer and Model would have add_state and get_state methods.
Well, I think we …
[View More]need to discuss this further, so let's drag in others here:
I agree that a shared state class is needed, and could be used by
nonbonded lists. But what do you need it for right now, i.e. what else
would people use it for?
Calling State::update() in Model::evaluate() would not be sufficient, at
least for nonbonded lists, because you can also call
Restraint::evaluate() on individual restraints from the Python interface.
How would a nonbonded list know that it needs to do an update? I don't
like the Statistics class proposed in ModelData.h. (The idea of that
class is to automatically keep min/max/change statistics on all float
variables.) Why: because 1. if you don't want nonbonded lists,
maintaining these statistics is inefficient, and 2. because it's part of
the ModelData class, it can't easily be extended by other classes. Two
suggestions:
1. Classes can register callbacks/actions with ModelData::set_float, or
this could trigger a State::set_float method, to be notified whenever
the model is changed. The advantage of the former is that the callback
can go away after a nonbond update is triggered, saving the overhead of
a function call for subsequent set_float()s.
2. Classes to allow the get/set of the 'optimizable state' (right now,
this is just all optimizable floats) could have similar methods, useful
for optimizers such as CG and steepest descent which change all
attributes simultaneously.
Ben
--
ben(a)salilab.org http://salilab.org/~ben/
"It is a capital mistake to theorize before one has data."
- Sir Arthur Conan Doyle
[View Less]
3
but it seems that a subset of the categories on the modeller web page
would apply.
andrej
On Oct 15, 2007, at 12:44 AM, Ben Webb wrote:
> Andrej Sali wrote:
>> where are they? a
>
> https://salilab.org/internal/imp/index.html
> https://salilab.org/internal/imp/imp2.html
> https://salilab.org/internal/imp/imp3.html
>
> Ben
> --
> ben(a)salilab.org http://salilab.org/~ben/
> "It is a capital mistake to theorize before one has …
[View More]data."
> - Sir Arthur Conan Doyle
--
Andrej Sali, Ph.D.
Professor and Vice Chair, Department of Biopharmaceutical Sciences
Department of Pharmaceutical Chemistry
California Institute for Quantitative Biosciences
University of California at San Francisco
UCSF MC 2552
Byers Hall Room 503B
1700 4th Street
San Francisco, CA 94158-2330, USA
Tel +1 (415) 514-4227; Fax +1 (415) 514-4231
Assistant, Ms. Karin Asensio, Tel +1 (415)514-4228; Lab +1 (415)
514-4232, 4233, 4258
Email sali(a)salilab.org; Web http://salilab.org
[View Less]
I'd like to get a consensus on naming for IMP modules and classes (well,
OK, I'm going to say what I think it should be, and people can agree or
disagree).
Currently, on the C++ side, all classes use a
Capitalized_Words_With_Underscores naming scheme (e.g. Restraint_Set)
and live in the 'imp' namespace. This is translated to the 'imp2' module
on the Python side, with the same class names. This has a few problems:
1. The Python 'imp2' name is ugly - Bret presumably had to call it that
…
[View More]because Python has a built-in module called 'imp' already. The Python
guys (http://www.python.org/dev/peps/pep-0008/) prefer "short,
lower-case" names for modules, but I don't think this really makes sense
for initialisms anyway - for example, there are already EMAN and CORBA
Python libraries out there (the BALL guys also use "BALL"). So I propose
"IMP".
2. Python pretty much mandates CamelCase for class names (e.g.
RestraintSet). Since the Python class names match the C++ names, we
either have to do lots of ugly renaming in the Python interface, or just
rename the C++ classes to match. Lots of C++ people use CamelCase anyway
(e.g. BALL at
http://www.bioinf.uni-sb.de/OK/BALL/Documentation/1.1.1/V1.1.1/hierarchy_ht…)
So I propose CamelCase names for IMP C++ classes.
3. There are a bunch of utility Python modules in tests/python_libs:
IMP_Modeller_Intf.py IMP_Test.py IMP_Utils.py load_imp_xml_model.py
I propose renaming these to IMP.modeller, IMP.test, IMP.utils, IMP.xml.
4. I propose renaming derived classes such as imp::RSR_Exclusion_Volume
(exclusion volume restraint, although what the second R in RSR stands
for, I don't know ;) as either imp::ExclusionVolumeRestraint or
imp::restraint::ExclusionVolume. I prefer the latter because it would
more easily translate into the Python IMP.restraint.ExclusionVolume
class, and by changing from a class name prefix (RSR) to a namespace we
both allow for more readable Python scripts (e.g. "import IMP.restraint
as r; r.ExclusionVolume()" or even "from IMP.restraint import *"), and
finally because this would greatly simplify moving restraints to their
own C++ module.
What do others think?
Ben
--
ben(a)salilab.org http://salilab.org/~ben/
"It is a capital mistake to theorize before one has data."
- Sir Arthur Conan Doyle
[View Less]