I have the following code I think is highly inefficient. Is there a better way to do this type common recoding in pandas?
我認為以下代碼效率很低。有沒有更好的方法在熊貓中進行這種類型的常見重新編碼?
df['F'] = 0
df['F'][(df['B'] >=3) & (df['C'] >=4.35)] = 1
df['F'][(df['B'] >=3) & (df['C'] < 4.35)] = 2
df['F'][(df['B'] < 3) & (df['C'] >=4.35)] = 3
df['F'][(df['B'] < 3) & (df['C'] < 4.35)] = 4
11
Use numpy.select
and cache boolean masks to variables for better performance:
使用numpy.select並將boolean mask緩存到變量以獲得更好的性能:
m1 = df['B'] >= 3
m2 = df['C'] >= 4.35
m3 = df['C'] < 4.35
m4 = df['B'] < 3
df['F'] = np.select([m1 & m2, m1 & m3, m4 & m2, m4 & m3], [1,2,3,4], default=0)
3
In your specific case, you can make use of the fact that booleans are actually integers (False == 0, True == 1) and use simple arithmetic:
在您的具體情況下,您可以利用布爾實際上是整數(False == 0,True == 1)並使用簡單算術的事實:
df['F'] = 1 + (df['C'] < 4.35) + 2 * (df['B'] < 3)
Note that this will ignore any NaN's in your B
and C
columns, these will be assigned as being above your limit.
請注意,這將忽略B和C列中的任何NaN,這些將被指定為高於您的限制。
本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2018/06/14/72516add8cebec576bc03809dce9d29c.html。