Suppose I have a one dimension data set, which contains many same numbers, for example data set
S = c(rep(4, times(1000)), rep(5, times(808)), rep(9, times(990))). Is there any efficient ways to do k-means in R? Actually in my data I have just a around 20 different points, but each of them appears around 100000 times, it runs very slow. So I wonder if there is a more efficient way.
假設我有一個一維數據集，它包含許多相同的數字，例如數據集S = c(rep(4, times(1000))， rep(5, times(808))， rep(9, times(990)))。在R中有沒有有效的方法來做k-means ?在我的數據中，我有大約20個不同的點，但每一個都出現了大約100000次，運行非常緩慢。所以我想知道有沒有更有效的方法。
K-means can be implemented with weights. It's straightforward to do so.
But IIRC the version included with R is not implemented this way. The version on
flexcluster maybe is, but it's pure R and much much much slower.
Either way, you will want to implement this in Fortran or C, like the regular kmeans version. Maybe you can find some package that has a good implementation already.