k表示R中的很多點

[英]k-means for many same points in R


Suppose I have a one dimension data set, which contains many same numbers, for example data set S = c(rep(4, times(1000)), rep(5, times(808)), rep(9, times(990))). Is there any efficient ways to do k-means in R? Actually in my data I have just a around 20 different points, but each of them appears around 100000 times, it runs very slow. So I wonder if there is a more efficient way.

假設我有一個一維數據集,它包含許多相同的數字,例如數據集S = c(rep(4, times(1000)), rep(5, times(808)), rep(9, times(990)))。在R中有沒有有效的方法來做k-means ?在我的數據中,我有大約20個不同的點,但每一個都出現了大約100000次,運行非常緩慢。所以我想知道有沒有更有效的方法。

1 个解决方案

#1


0  

K-means can be implemented with weights. It's straightforward to do so.

K-means可以用權重實現。這樣做很簡單。

But IIRC the version included with R is not implemented this way. The version on flexcluster maybe is, but it's pure R and much much much slower.

但是IIRC包含在R中的版本不是這樣實現的。flexcluster上的版本可能是,但它是純R,慢得多。

Either way, you will want to implement this in Fortran or C, like the regular kmeans version. Maybe you can find some package that has a good implementation already.

無論哪種方式,您都希望在Fortran或C中實現這一點,比如常規的kmeans版本。也許您可以找到一些已經具有良好實現的包。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2016/03/28/729b9324b38df1303f34b4bf48d92e69.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com