[英]How to calculate moving average using NumPy?

There seems to be no function that simply calculates the moving average on numpy/scipy, leading to convoluted solutions.

似乎沒有簡單計算numpy / scipy上的移動平均值的函數,導致復雜的解決方案。

My question is two-fold:


  • What's the easiest way to (correctly) implement a moving average with numpy?
  • 用numpy(正確)實現移動平均線的最簡單方法是什么?
  • Since this seems non-trivial and error prone, is there a good reason not to have the batteries included in this case?
  • 由於這似乎並非易事且容易出錯,因此有充分的理由不在這種情況下使用電池嗎?

2 个解决方案



If you just want a straightforward non-weighted moving average, you can easily implement it with np.cumsum, which may be is faster than FFT based methods:


EDIT Corrected an off-by-one wrong indexing spotted by Bean in the code. EDIT


def moving_average(a, n=3) :
    ret = np.cumsum(a, dtype=float)
    ret[n:] = ret[n:] - ret[:-n]
    return ret[n - 1:] / n

>>> a = np.arange(20)
>>> moving_average(a)
array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.,  16.,  17.,  18.])
>>> moving_average(a, n=4)
array([  1.5,   2.5,   3.5,   4.5,   5.5,   6.5,   7.5,   8.5,   9.5,
        10.5,  11.5,  12.5,  13.5,  14.5,  15.5,  16.5,  17.5])

So I guess the answer is: it is really easy to implement, and maybe numpy is already a little bloated with specialized functionality.




NumPy's lack of a particular domain-specific function is perhaps due to the Core Team's discipline and fidelity to NumPy's prime directive: provide an N-dimensional array type, as well as functions for creating, and indexing those arrays. Like many foundational objectives, this one is not small, and NumPy does it brilliantly.

NumPy缺乏特定的特定於域的功能可能是由於Core Team的紀律和對NumPy主要指令的保真度:提供N維數組類型,以及創建和索引這些數組的函數。像許多基本目標一樣,這個目標並不小,而且NumPy的表現非常出色。

The (much) larger SciPy contains a much larger collection of domain-specific libraries (called subpackages by SciPy devs)--for instance, numerical optimization (optimize), signal processsing (signal), and integral calculus (integrate).

(更大)的SciPy包含更大的特定於域的庫(由SciPy開發人員稱為子包) - 例如,數值優化(優化),信號處理(信號)和積分微積分(積分)。

My guess is that the function you are after is in at least one of the SciPy subpackages (scipy.signal perhaps); however, i would look first in the collection of SciPy scikits, identify the relevant scikit(s) and look for the function of interest there.

我的猜測是你所追求的功能至少在一個SciPy子包中(或許是scipy.signal);然而,我會先看看SciPy scikits的集合,找出相關的scikit(s)並尋找那里感興趣的功能。

Scikits are independently developed packages based on NumPy/SciPy and directed to a particular technical discipline (e.g., scikits-image, scikits-learn, etc.) Several of these were (in particular, the awesome OpenOpt for numerical optimization) were highly regarded, mature projects long before choosing to reside under the relatively new scikits rubric. The Scikits homepage liked to above lists about 30 such scikits, though at least several of those are no longer under active development.

Scikits是基於NumPy / SciPy獨立開發的軟件包,並針對特定的技術學科(例如,scikits-image,scikits-learn等)其中一些(特別是用於數值優化的令人敬畏的OpenOpt)受到高度重視,成熟的項目早在選擇居住在相對較新的scikits標題之前。 Scikits主頁上面列出了大約30個這樣的scikits,但其中至少有幾個不再處於積極開發階段。

Following this advice would lead you to scikits-timeseries; however, that package is no longer under active development; In effect, Pandas has become, AFAIK, the de facto NumPy-based time series library.


Pandas has several functions that can be used to calculate a moving average; the simplest of these is probably rolling_mean, which you use like so:


>>> # the recommended syntax to import pandas
>>> import pandas as PD
>>> import numpy as NP

>>> # prepare some fake data:
>>> # the date-time indices:
>>> t = PD.date_range('1/1/2010', '12/31/2012', freq='D')

>>> # the data:
>>> x = NP.arange(0, t.shape[0])

>>> # combine the data & index into a Pandas 'Series' object
>>> D = PD.Series(x, t)

Now, just call the function rolling_mean passing in the Series object and a window size, which in my example below is 10 days.


>>> d_mva = PD.rolling_mean(D, 10)

>>> # d_mva is the same size as the original Series
>>> d_mva.shape

>>> # though obviously the first w values are NaN where w is the window size
>>> d_mva[:3]
    2010-01-01         NaN
    2010-01-02         NaN
    2010-01-03         NaN

verify that it worked--e.g., compared values 10 - 15 in the original series versus the new Series smoothed with rolling mean

驗證它是否有效 - 例如,比較原始系列中的值10 - 15與使用滾動平均值平滑的新系列

>>> D[10:15]
     2010-01-11    2.041076
     2010-01-12    2.041076
     2010-01-13    2.720585
     2010-01-14    2.720585
     2010-01-15    3.656987
     Freq: D

>>> d_mva[10:20]
      2010-01-11    3.131125
      2010-01-12    3.035232
      2010-01-13    2.923144
      2010-01-14    2.811055
      2010-01-15    2.785824
      Freq: D

The function rolling_mean, along with about a dozen or so other function are informally grouped in the Pandas documentation under the rubric moving window functions; a second, related group of functions in Pandas is referred to as exponentially-weighted functions (e.g., ewma, which calculates exponentially moving weighted average). The fact that this second group is not included in the first (moving window functions) is perhaps because the exponentially-weighted transforms don't rely on a fixed-length window

Rolling_mean函數以及大約十幾個其他函數在Rubric移動窗口函數下的Pandas文檔中非正式地分組; Pandas中的第二個相關函數組稱為指數加權函數(例如,ewma,其計算指數移動加權平均值)。第二組未包含在第一組(移動窗口函數)中的事實可能是因為指數加權變換不依賴於固定長度的窗口



粤ICP备14056181号  © 2014-2021 ITdaan.com