### “python”中的加權高斯核密度估計

#### [英]Weighted Gaussian kernel density estimation in `python`

It is currently not possible to use `scipy.stats.gaussian_kde` to estimate the density of a random variable based on weighted samples. What methods are available to estimate densities of continuous random variables based on weighted samples?

## 3 个解决方案

### #1

21

Neither `sklearn.neighbors.KernelDensity` nor `statsmodels.nonparametric` seem to support weighted samples. I modified `scipy.stats.gaussian_kde` to allow for heterogeneous sampling weights and thought the results might be useful for others. An example is shown below.

sklearn.neighbors.KernelDensity和statsmodels.nonparametric似乎都不支持加權樣本。我修改了scipy.stats.gaussian_kde以允許異構采樣權重,並認為結果可能對其他人有用。一個例子如下所示。

An `ipython` notebook can be found here: http://nbviewer.ipython.org/gist/tillahoffmann/f844bce2ec264c1c8cb5

# Implementation details

The weighted arithmetic mean is

The unbiased data covariance matrix is then given by

The bandwidth can be chosen by `scott` or `silverman` rules as in `scipy`. However, the number of samples used to calculate the bandwidth is Kish's approximation for the effective sample size.

### #2

1

Check out the packages PyQT-Fit and statistics for Python. They seem to have kernel density estimation with weighted observations.

### #3

1

For univariate distributions you can use `KDEUnivariate` from statsmodels. It is not well documented, but the `fit` methods accepts a `weights` argument. Then you cannot use FFT. Here is an example:

``import matplotlib.pyplot as pltfrom statsmodels.nonparametric.kde import KDEUnivariatekde1= KDEUnivariate(np.array([10.,10.,10.,5.]))kde1.fit(bw=0.5)plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support],'x-')kde1= KDEUnivariate(np.array([10.,5.]))kde1.fit(weights=np.array([3.,1.]),          bw=0.5,         fft=False)plt.plot(kde1.support, [kde1.evaluate(xi) for xi in kde1.support], 'o-')``

which produces this figure: