“python”中的加權高斯核密度估計

[英]Weighted Gaussian kernel density estimation in `python`


It is currently not possible to use scipy.stats.gaussian_kde to estimate the density of a random variable based on weighted samples. What methods are available to estimate densities of continuous random variables based on weighted samples?

目前不可能使用scipy.stats.gaussian_kde來估計基於加權樣本的隨機變量的密度。有哪些方法可以根據加權樣本估算連續隨機變量的密度?

3 个解决方案

#1


21  

Neither sklearn.neighbors.KernelDensity nor statsmodels.nonparametric seem to support weighted samples. I modified scipy.stats.gaussian_kde to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below.

sklearn.neighbors.KernelDensity和statsmodels.nonparametric似乎都不支持加權樣本。我修改了scipy.stats.gaussian_kde以允許異構采樣權重,並認為結果可能對其他人有用。一個例子如下所示。

example

An ipython notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5

可以在這里找到ipython筆記本:http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5

Implementation details

The weighted arithmetic mean is

加權算術平均值是

weighted arithmetic mean

The unbiased data covariance matrix is then given byunbiased covariance matrix

然后給出無偏差數據協方差矩陣

The bandwidth can be chosen by scott or silverman rules as in scipy. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.

帶寬可以通過scotty中的scott或silverman規則來選擇。但是,用於計算帶寬的樣本數量是Kish對有效樣本量的近似值。

#2


1  

Check out the packages PyQT-Fit and statistics for Python. They seem to have kernel density estimation with weighted observations.

查看PyQT-Fit軟件包和Python的統計信息。他們似乎有加權觀察的核密度估計。

#3


1  

For univariate distributions you can use KDEUnivariate from statsmodels. It is not well documented, but the fit methods accepts a weights argument. Then you cannot use FFT. Here is an example:

對於單變量分布,您可以使用來自statsmodels的KDEUnivariate。它沒有很好的文檔記錄,但fit方法接受權重參數。然后你不能使用FFT。這是一個例子:

import matplotlib.pyplot as pltfrom statsmodels.nonparametric.kde import KDEUnivariatekde1= KDEUnivariate(np.array([10.,10.,10.,5.]))kde1.fit(bw=0.5)plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support],'x-')kde1= KDEUnivariate(np.array([10.,5.]))kde1.fit(weights=np.array([3.,1.]),          bw=0.5,         fft=False)plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support], 'o-')

which produces this figure:enter image description here

產生這個數字:


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2014/12/23/72f793889aa6fbadedeb06e729d1065a.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com