Pandas將隨機字符串分配給每個組作為新列

[英]Pandas assigning random string to each group as new column


We have a dataframe like

我們有一個數據幀

Out[90]: 
   customer_id                 created_at
0     11492288 2017-03-15 10:20:18.280437
1      8953727 2017-03-16 12:51:00.145629
2     11492288 2017-03-15 10:20:18.284974
3     11473213 2017-03-09 14:15:22.712369
4      9526296 2017-03-14 18:56:04.665410
5      9526296 2017-03-14 18:56:04.662082

I would like to create new column here, based on groups of customer_id, random strings of 8 characters assigned to each group.

我想在這里創建一個新列,基於customer_id組,分配給每個組的8個字符的隨機字符串。

For example the output would then look like

例如,輸出看起來像

Out[90]: 
   customer_id                 created_at     code
0     11492288 2017-03-15 10:20:18.280437 nKAILfyV
1      8953727 2017-03-16 12:51:00.145629 785Vsw0b
2     11492288 2017-03-15 10:20:18.284974 nKAILfyV
3     11473213 2017-03-09 14:15:22.712369 dk6JXq3u
4      9526296 2017-03-14 18:56:04.665410 1WESdAsD
5      9526296 2017-03-14 18:56:04.662082 1WESdAsD

I am used to R and dplyr, and it is super easy to write this transformation using them. I am looking for something similar in Pandas to this:

我習慣了R和dplyr,使用它們編寫這個轉換非常容易。我在Pandas尋找類似的東西:

library(dplyr)
library(stringi)

df %>%
  group_by(customer_id) %>%
  mutate(code = stri_rand_strings(1, 8))

I can figure out the random character part. Just curious on how Pandas groupby works in this case.

我可以找出隨機字符部分。只是好奇Pandas groupby在這種情況下是如何工作的。

Thanks!

2 个解决方案

#1


7  

In pandas (R's mutate) is transform

在熊貓(R的變異)是變換

df['code']=df.groupby('customer_id').transform(lambda x:pd.util.testing.rands_array(8,1))
df
Out[314]: 
   customer_id  created_at      code
0     11492288  2017-03-15  L6Odf65d
1      8953727  2017-03-16  fwLpgLnt
2     11492288  2017-03-15  L6Odf65d
3     11473213  2017-03-09  AuSUPnJ9
4      9526296  2017-03-14  U1AiLyx0
5      9526296  2017-03-14  U1AiLyx0

EDIT (from cᴏʟᴅsᴘᴇᴇᴅ) :df.groupby('customer_id').customer_id.transform(lambda x:pd.util.testing.rands_array(8,1))

編輯(來自cᴏʟᴅsᴘᴇᴇᴅ):df.groupby('customer_id')。customer_id.transform(lambda x:pd.util.testing.rands_array(8,1))

Also some improvement in you R code ,

你的R代碼也有一些改進,

Match=data.frame(A=unique(df$customer_id),B=replicate(length(unique(df$year)), stri_rand_strings(1, 8)))
df$Code=Match$B[match(df$customer_id,Match$A)]

#2


7  

import random
from string import ascii_letters, digits
chars = list(ascii_letters + digits)

choose = lambda x, k=8: ''.join(random.choices(chars, k=k))
df.assign(code=df.groupby('customer_id').transform(choose))

   customer_id                  created_at      code
0     11492288  2017-03-15 10:20:18.280437  S5HtmbeN
1      8953727  2017-03-16 12:51:00.145629  MMfFFn8U
2     11492288  2017-03-15 10:20:18.284974  S5HtmbeN
3     11473213  2017-03-09 14:15:22.712369  4VsKmDZ5
4      9526296  2017-03-14 18:56:04.665410  VhQfu2Rf
5      9526296  2017-03-14 18:56:04.662082  VhQfu2Rf

Inspired by @Wen's use of pd.util.testing.rands_array

受@ Wen使用pd.util.testing.rands_array的啟發

f, u = pd.factorize(df.customer_id.values)

df.assign(code=pd.util.testing.rands_array(8, u.size)[f])

   customer_id                  created_at      code
0     11492288  2017-03-15 10:20:18.280437  tSuQbTBm
1      8953727  2017-03-16 12:51:00.145629  qmCl6NEX
2     11492288  2017-03-15 10:20:18.284974  tSuQbTBm
3     11473213  2017-03-09 14:15:22.712369  Wsa3lNxh
4      9526296  2017-03-14 18:56:04.665410  jBfXS2Nk
5      9526296  2017-03-14 18:56:04.662082  jBfXS2Nk

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2017/09/07/7302a427ef45680076b1f694b99e4403.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com