Python熊貓:排除低於一定頻率計數的行

[英]Python pandas: exclude rows below a certain frequency count


So I have a pandas DataFrame that looks like this:

我有一個熊貓DataFrame像這樣:

r vals    positions
1.2       1
1.8       2
2.3       1
1.8       1
2.1       3
2.0       3
1.9       1
...       ...

I would like the filter out all rows by position that do not appear at least 20 times. I have seen something like this

我希望將所有行按位置過濾掉,至少不出現20次。我見過這樣的東西

g=df.groupby('positions')
g.filter(lambda x: len(x) > 20)

but this does not seem to work and I do not understand how to get the original dataframe back from this. Thanks in advance for the help.

但這似乎不起作用,我也不明白如何從這里取回原始的dataframe。謝謝你的幫助。

2 个解决方案

#1


18  

On your limited dataset the following works:

在您有限的數據集上有以下工作:

In [125]:
df.groupby('positions')['r vals'].filter(lambda x: len(x) >= 3)

Out[125]:
0    1.2
2    2.3
3    1.8
6    1.9
Name: r vals, dtype: float64

You can assign the result of this filter and use this with isin to filter your orig df:

您可以分配這個過濾器的結果,並使用這個與isin過濾您的orig df:

In [129]:
filtered = df.groupby('positions')['r vals'].filter(lambda x: len(x) >= 3)
df[df['r vals'].isin(filtered)]

Out[129]:
   r vals  positions
0     1.2          1
1     1.8          2
2     2.3          1
3     1.8          1
6     1.9          1

You just need to change 3 to 20 in your case

你只需要改變3到20就可以了

Another approach would be to use value_counts to create an aggregate series, we can then use this to filter your df:

另一種方法是使用value_counts創建聚合系列,然后我們可以使用它來過濾您的df:

In [136]:
counts = df['positions'].value_counts()
counts

Out[136]:
1    4
3    2
2    1
dtype: int64

In [137]:
counts[counts > 3]

Out[137]:
1    4
dtype: int64

In [135]:
df[df['positions'].isin(counts[counts > 3].index)]

Out[135]:
   r vals  positions
0     1.2          1
2     2.3          1
3     1.8          1
6     1.9          1

EDIT

編輯

If you want to filter the groupby object on the dataframe rather than a Series then you can call filter on the groupby object directly:

如果您想在dataframe上過濾groupby對象,而不是在序列上過濾,那么您可以直接調用groupby對象上的filter:

In [139]:
filtered = df.groupby('positions').filter(lambda x: len(x) >= 3)
filtered

Out[139]:
   r vals  positions
0     1.2          1
2     2.3          1
3     1.8          1
6     1.9          1

#2


0  

How about selecting all position rows with values >= 20

選擇>= 20的所有位置行怎么樣

mask = df['position'] >= 20
sel = df.ix[mask, :]

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2015/05/27/72542b7fc81a11979c7a75156bc81160.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com