Dina has run into the need to add data files to IMP (specifically for bond topology). There are a number of options that I see for this, but none of them are idea. The important properties that I see are: - users (and their scripts/config files) should only every have to specify paths to files that are theirs (no paths to things that are part of IMP) - it should be possible to debug IMP without installing it so that you don't have to worry about which copy of the header file you are editing
The two choices that I see which satisfy these criteria are: - embed all data into the data section of the libraries. This presents the simplest view to users: standard data that are part of IMP just work. Most data files can be embedded as a string variable (perhaps with some escaping,which could be done by a script) and then read using a string stream. The disadvantage is that the data files are not accessible to the users so they can't easily create patched versions. While this is generally not a problem, the API design would have to take this into account (and allow the user to specify multiple data sources at once or apply the operation successively with different data sources).
- IMP is built with the path to the data stored internally and it must be installed before use. The build dir could be a special install target which then doesn't copy the headers or libs since the links are already there. imppy.sh could go away on most platforms as we could safetly embed the paths to the build/lib dir as the library will be rebuild before being installed
Daniel Russel wrote: > The two choices that I see which satisfy these criteria are: > - embed all data into the data section of the libraries.
I don't like this option so much, since it makes it harder to modify the libraries (you'd have to rebuild IMP each time). Users that get prebuilt binaries would never be able to inspect these data files. Plus, all of the libraries would get read in to memory whether you need them or not, increasing the load time and the memory footprint. Finally, if you stored them as strings and streamed them, you'd end up with two copies of everything - one the string and the other the actual populated data structure. So I don't think this is a good idea for anything but very small data files that are expected to never change (and for those it may make more sense to store them as static structs anyway).
> - IMP is built with the path to the data stored internally and it must > be installed before use.
That's the usual Unix way of doing things, so I think we should go for that. Each module can have a data directory at the same level as the existing src, include, pyext, etc. directories, and at install time these files can be placed in /usr/share/imp/modulename/ or similar.
> imppy.sh could go away on most platforms as we could safetly > embed the paths to the build/lib dir as the library will be rebuild > before being installed
You'd still need it - or something equivalent - for the Python path, and on platforms other than Linux and Mac where rpath doesn't work.
Ben
On Apr 25, 2009, at 8:39 AM, Ben Webb wrote:
> Daniel Russel wrote: >> The two choices that I see which satisfy these criteria are: >> - embed all data into the data section of the libraries. > > I don't like this option so much, since it makes it harder to modify > the > libraries (you'd have to rebuild IMP each time). Users that get > prebuilt > binaries would never be able to inspect these data files. I would hope they would never need to inspect them :-) If so, something is poorly designed/documented. The bigger issue is that such a plan is that binary data support is a pain (although I have run into standard ways of doing this, especially for the common case of bitmap images).
> Plus, all of > the libraries would get read in to memory whether you need them or > not, > increasing the load time and the memory footprint. Anything really large can easily go in a separate library. And I wouldn't imagine that unused data in a library is ever read off disk (that is what memory mapped io is for). But I haven't specifically looked into its handling on our various platforms.
> (and for those it may > make more sense to store them as static structs anyway). This is just a mechanism for doing that which can be reused for parsing user data.
> >> - IMP is built with the path to the data stored internally and it >> must >> be installed before use. > > That's the usual Unix way of doing things, so I think we should go for > that. Each module can have a data directory at the same level as the > existing src, include, pyext, etc. directories, and at install time > these files can be placed in /usr/share/imp/modulename/ or similar. Again, it is pretty important for debugging that we support installation into the build directory or somewhere which doesn't involve copying headers. I don't think that involves too much messing with the install scripts.
> You'd still need it - or something equivalent - for the Python path, > and > on platforms other than Linux and Mac where rpath doesn't work. Typically you install things somewhere where those things are already set up (otherwise it is a useless installation). Plus, no one uses anything other than those platforms, so not requiring it for them would be a nice step forward.
Daniel Russel wrote: >> That's the usual Unix way of doing things, so I think we should go for >> that. Each module can have a data directory at the same level as the >> existing src, include, pyext, etc. directories, and at install time >> these files can be placed in /usr/share/imp/modulename/ or similar. > Again, it is pretty important for debugging that we support installation > into the build directory or somewhere which doesn't involve copying > headers. I don't think that involves too much messing with the install > scripts.
Of course - I agree, and will set something up to do just this.
>> You'd still need [imppy.sh] - or something equivalent - for the Python path, and >> on platforms other than Linux and Mac where rpath doesn't work. > Typically you install things somewhere where those things are already > set up (otherwise it is a useless installation).
Of course - you don't need imppy.sh to run an installed version of IMP. That's never been the case, and the install target does not install imppy.sh. I was referring to the use of IMP in the build directory, which in most cases is not in the Python path.
Ben
participants (2)
-
Ben Webb
-
Daniel Russel