如何閱讀R中的Whatsapp電子郵件文件?

[英]How to read Whatsapp email file in R?


Whatsapp has an option to email the group conversation to yourself. I did that and now want to explore it in R. The problem is that it seems to have multiple separators which I don't know how to handle in R.
Here is what I tried:

Whatsapp可以選擇通過電子郵件發送群組對話給自己。我做到了,現在想在R中探索它。問題是它似乎有多個分隔符,我不知道如何處理R.這是我嘗試的:

library(readr)
library(dplyr)
> gf <- read_delim('df.txt', col_names = F, skip = 2, delim='\t')
Warning message:
15 problems parsing 'df.txt'. See problems(...) for more details. 
> head(gf)
Source: local data frame [6 x 12]

       X1       X2                           X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
1  9:14pm  Mar 31                  umair: Great       NA NA NA NA  NA  NA  NA
2  9:14pm  Mar 31              umair: I am back       NA NA NA NA  NA  NA  NA
3  9:15pm  Mar 31                     umair: ??       NA NA NA NA  NA  NA  NA
4 10:27pm  Mar 31      umair: Kon kon zinda hay       NA NA NA NA  NA  NA  NA
5 10:49pm  Mar 31   Kazim: Sab zinda hain .....       NA NA NA NA  NA  NA  NA
6 10:50pm  Mar 31              umair: Very good       NA NA NA NA  NA  NA  NA

Can you help me read this file so that the "sender:message" is separated into 2 columns? And the first 2 columns are read as separate columns as shown. Obviously I don't want columns X4 to X12.

你能幫我讀一下這個文件,以便將“sender:message”分成2列嗎?前兩列作為單獨的列讀取,如圖所示。顯然我不希望列X4到X12。

Edit:

Here are the first few lines of the raw file:

以下是原始文件的前幾行:

9:14pm, Mar 31 - umair: Great
9:14pm, Mar 31 - umair: I am back
9:15pm, Mar 31 - umair: 👹
10:27pm, Mar 31 - umair: Kon kon zinda hay
10:49pm, Mar 31 - Kazim: Sab zinda hain .....
10:50pm, Mar 31 - umair: Very good
10:52pm, Mar 31 - umair: Abid agaya dobara?
10:54pm, Mar 31 - Kazim: Nai wo nai aya
10:54pm, Mar 31 - umair: Hmmmmmmmmm

1 个解决方案

#1


This question is old, yet when I wanted to do the same thing, my google search lead me here. I figured it out and put it into an R package. Install and read in the data:

這個問題很老,但當我想做同樣的事情時,我的谷歌搜索引導我到這里。我想出來並把它放入R包中。安裝並讀入數據:

devtools::install_github("JBGruber/rwhatsapp")
library(rwhatsapp)
gf <- rwa_read("df.txt")

Or you can directly paste in the lines

或者你可以直接粘貼在線條中

> lines <- c(
  "9:14pm, Mar 31 - umair: Great",
  "9:14pm, Mar 31 - umair: I am back",
  "9:15pm, Mar 31 - umair:  ",
  "10:27pm, Mar 31 - umair: Kon kon zinda hay",
  "10:49pm, Mar 31 - Kazim: Sab zinda hain .....",
  "10:50pm, Mar 31 - umair: Very good",
  "10:52pm, Mar 31 - umair: Abid agaya dobara?",
  "10:54pm, Mar 31 - Kazim: Nai wo nai aya",
  "10:54pm, Mar 31 - umair: Hmmmmmmmmm"
)
> rwa_read(lines)
# A tibble: 9 x 3
  time                author text                
  <dttm>              <fct>  <chr>               
1 2018-03-31 21:14:13 umair  Great               
2 2018-03-31 21:14:13 umair  I am back           
3 2018-03-31 21:15:13 umair  " "                 
4 2018-03-31 22:27:13 umair  Kon kon zinda hay   
5 2018-03-31 22:49:13 Kazim  Sab zinda hain .....
6 2018-03-31 22:50:13 umair  Very good           
7 2018-03-31 22:52:13 umair  Abid agaya dobara?  
8 2018-03-31 22:54:13 Kazim  Nai wo nai aya      
9 2018-03-31 22:54:13 umair  Hmmmmmmmmm  

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2015/05/05/725900819566a5fd6a6c0769c9c88b4a.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com