Re: [IMP-dev] clustering - code consolidation

14 Jun 2012


      p.s. We might not really need IncrementalClustering, we can just include
add_data() in Clusterer. Incremental algorithms would be an implementation
detail.
On Thu, Jun 14, 2012 at 1:58 PM, Barak Raveh barak.raveh@gmail.com wrote:
> good points. Here is the proposed interface.
>
> class Clusterer{
>
>   // use explicit vector embedding to initialize the data
>   Clusterer(Embedding data, params);
>
>   // use implicit conversion of data from e.g. FloatsList to data
>   Clusterer(EmbeddingAdaptor data, params);
>
>   // run the clustering on the stored data
>   virtual PartitioningClusteringResults do_clustering() = 0;
> };
>
> class XXXXIncrementalClusterer : Clusterer{
>   // use explicit vector embedding to initialize the data
>   XXXXIncrementalClusterer(Embedding data, params);
>
>   // use implicit conversion of data from e.g. FloatsList to data
>   XXXXIncremenralClusterer(EmbeddingAdaptor data, params);
>
>   //
>   PartitioningClusteringResults add_data(XXXX data);
> };
>
> On Thu, Jun 14, 2012 at 1:34 PM, Daniel Russel drussel@gmail.com wrote:
>
>> Sorry, I didn't see this before. I think my previous comments still
>> stand. As a couple additional ones:
>> - having "execute" style methods on objects isn't a very nice practice.
>> It destroys type safety, since for class A you really have two different
>> types of A, the pre-execute A and the post-execute A and some contexts
>> require one and some the other (and the compiler can't check). And doesn't
>> really give you anything that you can't get from producing a new object as
>> the result. Ideally, classes should have what I have seen called the "no
>> protocols" property: you can call any function of the class in any order.
>>
>> - I kind of prefer Clusterer to Clustering when referring to the
>> algorithm as the latter very much means the result of running a clustering
>> algorithm to me as opposed to something that does clustering. But that may
>> be my eccentricity (but if you google "clusterer" the results seem
>> consistent with that usage). In any case, we need to make sure that
>> distinct terms are used for distinct things.
>>
>>
>> On Thu, Jun 14, 2012 at 12:38 PM, Barak Raveh barak.raveh@gmail.comwrote:
>>
>>> Now the full version...
>>> Daniel and I discussed a little bit consolidation of clustering things
>>> in statistics and kmeans modules. Please tell me if it is agreed that
>>> things will work with the following interface:
>>> * The Embedding family of classes (used to embed data in vector form)
>>> will remain as is
>>> * There will be a "Clustering" class from which all clustering
>>> algorithms will derive, with a constructor that takes either Embedding
>>> class, or EmbeddingAdaptor for implicit conversions from e.g., FloatsList
>>> * The Clustering classes will also have a *void ::execute()* method and
>>> a* ::get_clustering results() *method, that will return the clustering
>>> results (using the exisiting PartitioningClustering class, perhaps we can
>>> change its name to PartitioningClusteringResults).
>>>
>>> So bottom line, if you will want to cluster some data, you will do
>>> something like
>>> *FloatsList data; // or create an Embedding object*
>>> *KMeansClustering kmeans(data, params);*
>>> *kmeans.execute();*
>>> *PartitioningClustering clustering_results =
>>> kmeans.get_clustering_results()*
>>>
>>> Makes sense? For backward compatibility, I will add a DEPRECATED warning
>>> to existing clustering methods, so they will be removed within a few months
>>> completely.
>>>
>>> Barak
>>>
>>> _______________________________________________
>>> IMP-dev mailing list
>>> IMP-dev@salilab.org
>>> https://salilab.org/mailman/listinfo/imp-dev
>>>
>>>
>>
>
>
> --
> Barak
>
-- 
Barak