self-organizing maps

Yannick Spill

18 Jun 2012 18 Jun '12

5:31 a.m.

Hi list,

In the light of my recent discovery of the statistics module in imp, I was wondering if some of you would be interested in an implementation of self-organizing maps (SOM). They are a generalization of principal components analysis to nonlinear models. http://en.wikipedia.org/wiki/Self-organizing_map There's a shortcut for us to have it in IMP. It's called GPLVM, and is a probabilistic SOM based on gaussian processes. Is that something that would interest anyone?

Yannick

Show replies by date

Barak Raveh

18 Jun 18 Jun

9:39 a.m.

I say Aye. BUT - if we put a lot of tools in same folder, we will have a total mess with hundreds of files in the same place. We cannot make subfolders due to the automated scripts, how about instead we'll start a convention of naming folders like "statistics_clustering", "statistics_som", "statistics_misc", etc. This is instead of having subfolders "statistics/clustering" "statistics/som", so the automated script would work as is. It would also improve build times due to smaller dependencies constraints.

On Mon, Jun 18, 2012 at 5:31 AM, Yannick Spill yannick@salilab.org wrote:

> Hi list, > > In the light of my recent discovery of the statistics module in imp, I was > wondering if some of you would be interested in an implementation of > self-organizing maps (SOM). They are a generalization of principal > components analysis to nonlinear models. > http://en.wikipedia.org/wiki/**Self-organizing_map http://en.wikipedia.org/wiki/Self-organizing_map > There's a shortcut for us to have it in IMP. It's called GPLVM, and is a > probabilistic SOM based on gaussian processes. Is that something that would > interest anyone? > > Yannick > ______________________________**_________________ > IMP-dev mailing list > IMP-dev@salilab.org > https://salilab.org/mailman/**listinfo/imp-dev https://salilab.org/mailman/listinfo/imp-dev >

-- Barak

Daniel Russel

11:30 a.m.

On Mon, Jun 18, 2012 at 9:39 AM, Barak Raveh barak.raveh@gmail.com wrote:

> I say Aye. > BUT - if we put a lot of tools in same folder, we will have a total mess > with hundreds of files in the same place. We cannot make subfolders due to > the automated scripts, how about instead we'll start a convention of naming > folders like "statistics_clustering", "statistics_som", "statistics_misc", > etc. This is instead of having subfolders "statistics/clustering" > "statistics/som", so the automated script would work as is. It would also > improve build times due to smaller dependencies constraints. > Just to add a caveat: having more modules can (if the module dependencies are sparse) decrease incremental build times however it does increase the amount of time it takes scons to startup each time you run it (since it has to scan all the files) and increases the amount of time a clean build takes since there is significant overhead for each swig wrapper. Just something to be aware of, I'm not sure where the optimal tradeoff lies (or that there is an optimal value over all the various use cases).

Using scons in interactive mode gets the startup time one of these problems.

Ben Webb

10:30 a.m.

On 06/18/2012 05:31 AM, Yannick Spill wrote: > In the light of my recent discovery of the statistics module in imp, I > was wondering if some of you would be interested in an implementation of > self-organizing maps (SOM). They are a generalization of principal > components analysis to nonlinear models. > http://en.wikipedia.org/wiki/Self-organizing_map > There's a shortcut for us to have it in IMP. It's called GPLVM, and is a > probabilistic SOM based on gaussian processes. Is that something that > would interest anyone?

If you're talking about GP-LVM from http://www.cs.man.ac.uk/~neill/gplvmcpp/ then we can't include that in IMP, unfortunately, because it is not open source software.

Ben

-- ben@salilab.org http://salilab.org/~ben/ "It is a capital mistake to theorize before one has data." - Sir Arthur Conan Doyle

Yannick Spill

2:31 p.m.

Le 18/06/12 19:30, Ben Webb a écrit : > On 06/18/2012 05:31 AM, Yannick Spill wrote: >> In the light of my recent discovery of the statistics module in imp, I >> was wondering if some of you would be interested in an implementation of >> self-organizing maps (SOM). They are a generalization of principal >> components analysis to nonlinear models. >> http://en.wikipedia.org/wiki/Self-organizing_map >> There's a shortcut for us to have it in IMP. It's called GPLVM, and is a >> probabilistic SOM based on gaussian processes. Is that something that >> would interest anyone? > > If you're talking about GP-LVM from > http://www.cs.man.ac.uk/~neill/gplvmcpp/ then we can't include that in > IMP, unfortunately, because it is not open source software. > > Ben Sure, but the paper describes how it is made. I already wrote an implementation of the gaussian process, and one would just need to adapt it to this case by writing the covariance function and its derivatives. I'm willing to do it, although I'd really appreciate some programming help to speedup things, but I wasn't going to do it without anyone interested. As far as where to put it, the core of the gaussian process is in the isd module, so no need to rename stuff for the moment.

4670

Age (days ago)

4670

Last active (days ago)

List overview

Download

4 comments

4 participants

tags (0)

participants (4)

Barak Raveh
Ben Webb
Daniel Russel
Yannick Spill