p.s. We might not really need IncrementalClustering, we can just include add_data() in Clusterer. Incremental algorithms would be an implementation detail.
good points. Here is the proposed interface.class Clusterer{// use explicit vector embedding to initialize the dataClusterer(Embedding data, params);// use implicit conversion of data from e.g. FloatsList to dataClusterer(EmbeddingAdaptor data, params);// run the clustering on the stored datavirtual PartitioningClusteringResults do_clustering() = 0;
};
class XXXXIncrementalClusterer : Clusterer{// use explicit vector embedding to initialize the dataXXXXIncrementalClusterer(Embedding data, params);// use implicit conversion of data from e.g. FloatsList to dataXXXXIncremenralClusterer(EmbeddingAdaptor data, params);//PartitioningClusteringResults add_data(XXXX data);
};--On Thu, Jun 14, 2012 at 1:34 PM, Daniel Russel <drussel@gmail.com> wrote:Sorry, I didn't see this before. I think my previous comments still stand. As a couple additional ones:
- having "execute" style methods on objects isn't a very nice practice. It destroys type safety, since for class A you really have two different types of A, the pre-execute A and the post-execute A and some contexts require one and some the other (and the compiler can't check). And doesn't really give you anything that you can't get from producing a new object as the result. Ideally, classes should have what I have seen called the "no protocols" property: you can call any function of the class in any order.
- I kind of prefer Clusterer to Clustering when referring to the algorithm as the latter very much means the result of running a clustering algorithm to me as opposed to something that does clustering. But that may be my eccentricity (but if you google "clusterer" the results seem consistent with that usage). In any case, we need to make sure that distinct terms are used for distinct things.
On Thu, Jun 14, 2012 at 12:38 PM, Barak Raveh <barak.raveh@gmail.com> wrote:
_______________________________________________Now the full version...Daniel and I discussed a little bit consolidation of clustering things in statistics and kmeans modules. Please tell me if it is agreed that things will work with the following interface:* The Embedding family of classes (used to embed data in vector form) will remain as is* There will be a "Clustering" class from which all clustering algorithms will derive, with a constructor that takes either Embedding class, or EmbeddingAdaptor for implicit conversions from e.g., FloatsList* The Clustering classes will also have a void ::execute() method and a ::get_clustering results() method, that will return the clustering results (using the exisiting PartitioningClustering class, perhaps we can change its name to PartitioningClusteringResults).So bottom line, if you will want to cluster some data, you will do something likeFloatsList data; // or create an Embedding objectKMeansClustering kmeans(data, params);kmeans.execute();PartitioningClustering clustering_results = kmeans.get_clustering_results()Makes sense? For backward compatibility, I will add a DEPRECATED warning to existing clustering methods, so they will be removed within a few months completely.Barak
IMP-dev mailing list
IMP-dev@salilab.org
https://salilab.org/mailman/listinfo/imp-dev
Barak