[翻译]  Use perl=TRUE regex in dplyr select

[CHINESE]  在dplyr select中使用perl = TRUE正则表达式


How can I select cols using perl = TRUE like regex.

如何使用perl = TRUE选择cols,就像正则表达式一样。

data.frame(baa=0,boo=0,boa=0,lol=0,bAa=0) %>% dplyr::select(matches("(?i)b(?!a)"))

Error in grep(needle, haystack, ...) : invalid regular expression '(?i)b(?!a)', reason 'Invalid regexp'

grep(needle,haystack,...)出错:正则表达式无效'(?i)b(?!a)',原因'无效正则表达式'

regex is indeed valid.

正则表达式确实有效。

grep("(?i)b(?!a)",c("baa","boo","boa","lol","bAa"),perl=T)

> [1] 2 3

Is there a shortcut function/way?

有快捷功能/方式吗?

3 个解决方案

#1


8  

matches in dplyr does not support perl = TRUE. However, you can make your own functions. After a bit of digging in the source code this works:

dplyr中的匹配项不支持perl = TRUE。但是,您可以自己创建功能。在对源代码进行一些挖掘之后,这可以工作:

The fast way:

快捷方式:

library(dplyr)

#notice the 3 colons because grep_vars is not exported from dplyr
matches2 <- function (match, ignore.case = TRUE, vars = current_vars()) 
{
  dplyr:::grep_vars(match, vars, ignore.case = ignore.case, perl = TRUE)
}

data.frame(baa=0,boo=0,boa=0,lol=0,bAa=0) %>% select(matches2("(?i)b(?!a)"))
#boo boa
#1   0   0

Or a more explanatory solution:

或者更具解释性的解决方案

matches2 <- function (match, ignore.case = TRUE, vars = current_vars()) 
{
  grep_vars2(match, vars, ignore.case = ignore.case)
}

#this is pretty much my only change in the original dplyr:::grep_vars
#to make it accept perl.
grep_vars2 <- function (needle, haystack, ...) 
{
  grep(needle, haystack, perl = TRUE, ...)
}

 data.frame(baa=0,boo=0,boa=0,lol=0,bAa=0) %>% 
   select(matches2("(?i)b(?!a)"))
 #boo boa
 #1   0   0

#2


1  

Another approach, although along the lines and probably more dangerous than LyzandeR's suggestion:

另一种方法,虽然顺便说一句,可能比LyzandeR的建议更危险:

body(matches)[[grep("grep_vars", body(matches))]] <- substitute(grep_vars(match, vars, ignore.case = ignore.case, perl=T))

data.frame(baa=0,boo=0,boa=0,lol=0,bAa=0) %>% dplyr::select(matches("(?i)b(?!a)"))
  boo boa
1   0   0

I would not use body(matches)[[3]] as any updates would cause this little patch create problems.

我不会使用body(匹配)[[3]],因为任何更新都会导致这个小补丁产生问题。

#3


1  

As an amendment/side note to LyzandeRs answer here a version that does not use dplyr vocabulary, only the magrittr pipe. Hence, writing wrapper functions and specifying arguments, etc. may be skipped.

作为LyzandeRs的修正/附注,这里回答一个不使用dplyr词汇的版本,只使用magrittr管道。因此,可以跳过编写包装函数和指定参数等。

This is a bit more verbose than dplyr. But it is less verbose than base and allows to use the full flexibility of any function such as grep or stringi::stri_detect, etc.

这比dplyr更冗长。但它比基础更简洁,并允许使用任何函数的完全灵活性,如grep或stringi :: stri_detect等。

And it is significantly faster. Check below benchmarks. It should be noted, of course, that speed would have to be checked for larger examples, the overhead of dplyr is quite large for this small example, hence, a fair speed comparison depends on the use case.

它明显更快。检查以下基准。当然,应该注意的是,对于更大的例子,必须检查速度,对于这个小例子,dplyr的开销非常大,因此,公平的速度比较取决于用例。

df <- data.frame(baa=0,boo=0,boa=0,lol=0,bAa=0)

library(magrittr)
df %>% 
.[,grep("(?i)b(?!a)", names(.), perl = T)]
#    boo boa
# 1   0   0

#in the following a copy of LyzanderRs approaches
library(dplyr)
matches2 <- function (match, ignore.case = TRUE, vars = current_vars()) {
                      dplyr:::grep_vars(match, vars, ignore.case = ignore.case, perl = TRUE)
                      }

grep_vars2 <- function (needle, haystack, ...) {
                        grep(needle, haystack, perl = TRUE, ...)
                        }

matches3 <- function (match, ignore.case = TRUE, vars = current_vars()) {
                      grep_vars2(match, vars, ignore.case = ignore.case)
                      }

library(microbenchmark)
microbenchmark(
  df %>% select(matches2("(?i)b(?!a)")),
  df %>% select(matches3("(?i)b(?!a)")),
  df %>% .[,grep("(?i)b(?!a)", names(.), perl = T)]
)

# Unit: microseconds
#                 expr                                 min       lq      mean     median        uq       max    neval
# df %>% select(matches2("(?i)b(?!a)"))              3994.867 4309.877 4570.6414 4555.8065 4726.9310  6618.769   100
# df %>% select(matches3("(?i)b(?!a)"))              3981.841 4177.834 4792.2025 4396.3275 4655.6780 31812.876   100
# df %>% .[, grep("(?i)b(?!a)", names(.), perl = T)]  183.164  210.797  242.1678  237.2455  263.6935   554.624   100

注意!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。



 
© 2014-2018 ITdaan.com 粤ICP备14056181号