如何在scipy中模擬2-樣本t檢驗。

[英]howto emulate 2-sample t-test in scipy

I'm trying to emulate MS Excel's t-probe function in Python. I need to do this because I have to automate some calculations there were previously done in Excel. Here is my test program:

``````import scipy.stats
a = [5, 0.9,  -0.4, -0.9, 0.5, 0.8, 0.2, 0.2, 0, -0.8]
b = [1.1, 0.9, -0.5, -0.7, 0.6, 0.7, 0.3, 0.1, -0.1, -0.7]

print scipy.stats.ttest_ind(a,b, equal_var=True)
``````

This is the result:

``````(array(0.6661542796363409), 0.51376033318001801)
``````

However, Excel gives this value for the same input: 0.35844407

I noticed that they have used tail=2 parameter (see http://office.microsoft.com/en-us/excel-help/ttest-HP005209325.aspx ). Unfortunately, I have no idea how to calculate two tailed t-test with scipy. (In fact I don't know what it is.)

Another very strange thing is that in scipy, I get a sightly different result when I change the order of samples. E.g. if I move -0.7 to the head of b, then I get 0.51376033318001824 instead of 0.51376033318001801. Not a big difference, but still.

For Excel, it is a whole new story - looks like the two tailed t-test gives a significantly different result when the order of samples is different.

The question is: how can I emulate Excel's version of two tailed t-test in scipy?

1 个解决方案

#1

5

It looks like `Excel` is computing `ttest_rel`:

``````In [15]: import scipy.stats as stats

In [20]: stats.ttest_rel(a, b)
Out[20]: (array(0.9677712267394081), 0.35844406902161985)
``````

Use `stats.ttest_rel` when `a` and `b` are related. The docs say:

Examples for the use [of ttest_rel] are scores of the same set of student in different exams, or repeated sampling from the same units.

Use stats.ttest_ind when `a` and `b` are independent.

We can use [ttest_ind], if we observe two independent samples from the same or different population, e.g. exam scores of boys and girls or of two ethnic groups.