Whatsapp has an option to email the group conversation to yourself. I did that and now want to explore it in R. The problem is that it seems to have multiple separators which I don't know how to handle in R.
Here is what I tried:
library(readr) library(dplyr) > gf <- read_delim('df.txt', col_names = F, skip = 2, delim='\t') Warning message: 15 problems parsing 'df.txt'. See problems(...) for more details. > head(gf) Source: local data frame [6 x 12] X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 1 9:14pm Mar 31 umair: Great NA NA NA NA NA NA NA 2 9:14pm Mar 31 umair: I am back NA NA NA NA NA NA NA 3 9:15pm Mar 31 umair: ?? NA NA NA NA NA NA NA 4 10:27pm Mar 31 umair: Kon kon zinda hay NA NA NA NA NA NA NA 5 10:49pm Mar 31 Kazim: Sab zinda hain ..... NA NA NA NA NA NA NA 6 10:50pm Mar 31 umair: Very good NA NA NA NA NA NA NA
Can you help me read this file so that the "sender:message" is separated into 2 columns? And the first 2 columns are read as separate columns as shown. Obviously I don't want columns X4 to X12.
Here are the first few lines of the raw file:
9:14pm, Mar 31 - umair: Great 9:14pm, Mar 31 - umair: I am back 9:15pm, Mar 31 - umair: 👹 10:27pm, Mar 31 - umair: Kon kon zinda hay 10:49pm, Mar 31 - Kazim: Sab zinda hain ..... 10:50pm, Mar 31 - umair: Very good 10:52pm, Mar 31 - umair: Abid agaya dobara? 10:54pm, Mar 31 - Kazim: Nai wo nai aya 10:54pm, Mar 31 - umair: Hmmmmmmmmm
This question is old, yet when I wanted to do the same thing, my google search lead me here. I figured it out and put it into an R package. Install and read in the data:
devtools::install_github("JBGruber/rwhatsapp") library(rwhatsapp) gf <- rwa_read("df.txt")
Or you can directly paste in the lines
> lines <- c( "9:14pm, Mar 31 - umair: Great", "9:14pm, Mar 31 - umair: I am back", "9:15pm, Mar 31 - umair: ", "10:27pm, Mar 31 - umair: Kon kon zinda hay", "10:49pm, Mar 31 - Kazim: Sab zinda hain .....", "10:50pm, Mar 31 - umair: Very good", "10:52pm, Mar 31 - umair: Abid agaya dobara?", "10:54pm, Mar 31 - Kazim: Nai wo nai aya", "10:54pm, Mar 31 - umair: Hmmmmmmmmm" ) > rwa_read(lines) # A tibble: 9 x 3 time author text <dttm> <fct> <chr> 1 2018-03-31 21:14:13 umair Great 2 2018-03-31 21:14:13 umair I am back 3 2018-03-31 21:15:13 umair " " 4 2018-03-31 22:27:13 umair Kon kon zinda hay 5 2018-03-31 22:49:13 Kazim Sab zinda hain ..... 6 2018-03-31 22:50:13 umair Very good 7 2018-03-31 22:52:13 umair Abid agaya dobara? 8 2018-03-31 22:54:13 Kazim Nai wo nai aya 9 2018-03-31 22:54:13 umair Hmmmmmmmmm