Re: [IMP-dev] clustering - code consolidation

14 Jun 2012


      good points. Here is the proposed interface.
class Clusterer{
// use explicit vector embedding to initialize the data
  Clusterer(Embedding data, params);
// use implicit conversion of data from e.g. FloatsList to data
  Clusterer(EmbeddingAdaptor data, params);
// run the clustering on the stored data
  virtual PartitioningClusteringResults do_clustering() = 0;
};
class XXXXIncrementalClusterer : Clusterer{
  // use explicit vector embedding to initialize the data
  XXXXIncrementalClusterer(Embedding data, params);
// use implicit conversion of data from e.g. FloatsList to data
  XXXXIncremenralClusterer(EmbeddingAdaptor data, params);
//
  PartitioningClusteringResults add_data(XXXX data);
};
On Thu, Jun 14, 2012 at 1:34 PM, Daniel Russel drussel@gmail.com wrote:
> Sorry, I didn't see this before. I think my previous comments still stand.
> As a couple additional ones:
> - having "execute" style methods on objects isn't a very nice practice. It
> destroys type safety, since for class A you really have two different types
> of A, the pre-execute A and the post-execute A and some contexts require
> one and some the other (and the compiler can't check). And doesn't really
> give you anything that you can't get from producing a new object as the
> result. Ideally, classes should have what I have seen called the "no
> protocols" property: you can call any function of the class in any order.
>
> - I kind of prefer Clusterer to Clustering when referring to the algorithm
> as the latter very much means the result of running a clustering algorithm
> to me as opposed to something that does clustering. But that may be my
> eccentricity (but if you google "clusterer" the results seem consistent
> with that usage). In any case, we need to make sure that distinct terms are
> used for distinct things.
>
>
> On Thu, Jun 14, 2012 at 12:38 PM, Barak Raveh barak.raveh@gmail.comwrote:
>
>> Now the full version...
>> Daniel and I discussed a little bit consolidation of clustering things in
>> statistics and kmeans modules. Please tell me if it is agreed that things
>> will work with the following interface:
>> * The Embedding family of classes (used to embed data in vector form)
>> will remain as is
>> * There will be a "Clustering" class from which all clustering algorithms
>> will derive, with a constructor that takes either Embedding class, or
>> EmbeddingAdaptor for implicit conversions from e.g., FloatsList
>> * The Clustering classes will also have a *void ::execute()* method and a
>> * ::get_clustering results() *method, that will return the clustering
>> results (using the exisiting PartitioningClustering class, perhaps we can
>> change its name to PartitioningClusteringResults).
>>
>> So bottom line, if you will want to cluster some data, you will do
>> something like
>> *FloatsList data; // or create an Embedding object*
>> *KMeansClustering kmeans(data, params);*
>> *kmeans.execute();*
>> *PartitioningClustering clustering_results =
>> kmeans.get_clustering_results()*
>>
>> Makes sense? For backward compatibility, I will add a DEPRECATED warning
>> to existing clustering methods, so they will be removed within a few months
>> completely.
>>
>> Barak
>>
>> _______________________________________________
>> IMP-dev mailing list
>> IMP-dev@salilab.org
>> https://salilab.org/mailman/listinfo/imp-dev
>>
>>
>
-- 
Barak