[英]k-means for many same points in R

Suppose I have a one dimension data set, which contains many same numbers, for example data set S = c(rep(4, times(1000)), rep(5, times(808)), rep(9, times(990))). Is there any efficient ways to do k-means in R? Actually in my data I have just a around 20 different points, but each of them appears around 100000 times, it runs very slow. So I wonder if there is a more efficient way.

假設我有一個一維數據集,它包含許多相同的數字,例如數據集S = c(rep(4, times(1000)), rep(5, times(808)), rep(9, times(990)))。在R中有沒有有效的方法來做k-means ?在我的數據中,我有大約20個不同的點,但每一個都出現了大約100000次,運行非常緩慢。所以我想知道有沒有更有效的方法。

1 个解决方案



K-means can be implemented with weights. It's straightforward to do so.


But IIRC the version included with R is not implemented this way. The version on flexcluster maybe is, but it's pure R and much much much slower.


Either way, you will want to implement this in Fortran or C, like the regular kmeans version. Maybe you can find some package that has a good implementation already.




粤ICP备14056181号  © 2014-2020 ITdaan.com