p.s. We might not really need IncrementalClustering, we can just include add_data() in Clusterer. Incremental algorithms would be an implementation detail.
On Thu, Jun 14, 2012 at 1:58 PM, Barak Raveh barak.raveh@gmail.com wrote:
> good points. Here is the proposed interface. > > class Clusterer{ > > // use explicit vector embedding to initialize the data > Clusterer(Embedding data, params); > > // use implicit conversion of data from e.g. FloatsList to data > Clusterer(EmbeddingAdaptor data, params); > > // run the clustering on the stored data > virtual PartitioningClusteringResults do_clustering() = 0; > }; > > class XXXXIncrementalClusterer : Clusterer{ > // use explicit vector embedding to initialize the data > XXXXIncrementalClusterer(Embedding data, params); > > // use implicit conversion of data from e.g. FloatsList to data > XXXXIncremenralClusterer(EmbeddingAdaptor data, params); > > // > PartitioningClusteringResults add_data(XXXX data); > }; > > On Thu, Jun 14, 2012 at 1:34 PM, Daniel Russel drussel@gmail.com wrote: > >> Sorry, I didn't see this before. I think my previous comments still >> stand. As a couple additional ones: >> - having "execute" style methods on objects isn't a very nice practice. >> It destroys type safety, since for class A you really have two different >> types of A, the pre-execute A and the post-execute A and some contexts >> require one and some the other (and the compiler can't check). And doesn't >> really give you anything that you can't get from producing a new object as >> the result. Ideally, classes should have what I have seen called the "no >> protocols" property: you can call any function of the class in any order. >> >> - I kind of prefer Clusterer to Clustering when referring to the >> algorithm as the latter very much means the result of running a clustering >> algorithm to me as opposed to something that does clustering. But that may >> be my eccentricity (but if you google "clusterer" the results seem >> consistent with that usage). In any case, we need to make sure that >> distinct terms are used for distinct things. >> >> >> On Thu, Jun 14, 2012 at 12:38 PM, Barak Raveh barak.raveh@gmail.comwrote: >> >>> Now the full version... >>> Daniel and I discussed a little bit consolidation of clustering things >>> in statistics and kmeans modules. Please tell me if it is agreed that >>> things will work with the following interface: >>> * The Embedding family of classes (used to embed data in vector form) >>> will remain as is >>> * There will be a "Clustering" class from which all clustering >>> algorithms will derive, with a constructor that takes either Embedding >>> class, or EmbeddingAdaptor for implicit conversions from e.g., FloatsList >>> * The Clustering classes will also have a *void ::execute()* method and >>> a* ::get_clustering results() *method, that will return the clustering >>> results (using the exisiting PartitioningClustering class, perhaps we can >>> change its name to PartitioningClusteringResults). >>> >>> So bottom line, if you will want to cluster some data, you will do >>> something like >>> *FloatsList data; // or create an Embedding object* >>> *KMeansClustering kmeans(data, params);* >>> *kmeans.execute();* >>> *PartitioningClustering clustering_results = >>> kmeans.get_clustering_results()* >>> >>> Makes sense? For backward compatibility, I will add a DEPRECATED warning >>> to existing clustering methods, so they will be removed within a few months >>> completely. >>> >>> Barak >>> >>> _______________________________________________ >>> IMP-dev mailing list >>> IMP-dev@salilab.org >>> https://salilab.org/mailman/listinfo/imp-dev >>> >>> >> > > > -- > Barak >