帶括號的Rownames:R中是否允許使用?

[英]Rownames with parentheses: are they allowed in R?


I am trying to subset data, using names of work and test set

我正在嘗試使用工作和測試集的名稱來對數據進行子集化

ws_data <- subset(data, grepl(paste0("v*[0-9]_",ws_names, collapse="|" ),
           rownames(data))==TRUE)

It seems to work ok, but for the rownames like

它似乎工作正常,但對於像這樣的rownames

"(Difluoromethoxy)trifluoromethane"

are just skipped. Are parenthese allowed as legal names in R? How can I solve this problem not changing row names? Thanks in advance!

剛跳過。是否允許括號作為R中的合法名稱?如何在不更改行名的情況下解決此問題?提前致謝!

The example of data

數據的例子

64 | v0064_(Chloro)(trifluor)omethane | -51.5 | 510.9 | 104.5 | 11.2 |
65 | v0067_(Dichloro)difluoromethane | -81.0 | 233.0 | 121.0 | 16.1 |

64 | v0064_(氯)(三氟)甲烷| -51.5 | 510.9 | 104.5 | 11.2 | 65 | v0067_(二氯)二氟甲烷| -81.0 | 233.0 | 121.0 | 16.1 |

Regular expressions

常用表達

rownames(ts)[1]
[1] "Bromotrifluoromethane"

rownames(ts)[1] [1]“Bromotrifluoromethane”

rownames(data)[1]
[1] "v0001_Bromotrifluoromethane"

rownames(數據)[1] [1]“v0001_Bromotrifluoromethane”

grepl("v[0-9]*_Bromotrifluoromethane", rownames(data)[1])
[1] TRUE

grepl(“v [0-9] * _ Bromotrifluoromethane”,rownames(數據)[1])[1] TRUE

grepl("v*[0-9]_Bromotrifluoromethane", rownames(data)[1])
[1] TRUE

grepl(“v * [0-9] _Bromotrifluoromethane”,rownames(數據)[1])[1] TRUE

2 个解决方案

#1


1  

I'm guessing the problem you're facing is the fact that the parentheses have a meaning in regular expressions. This post has a cure for that, which you can use to do something like this:

我猜你所面臨的問題是括號在正則表達式中有意義。這篇文章有一個治愈方法,你可以用來做這樣的事情:

quotemeta <- function(x) gsub("([^A-Za-z_0-9])", "\\\\\\1", x)

data[grepl(paste0("^v[0-9]*_", quotemeta(ws_names), collapse="|"), rownames(data)), ]

#2


2  

In general you can have rownames with characters like that in names and rownames, you just need to quote them when using them. I think the problem here is the subset function, it allows some unusual ways to specify the subset which makes some things easier, but others harder. It is trying to figure out what you mean by the rownames (rather than just take them as literal strings) and the parentheses are probably confusing that process.

一般來說,你可以在名字和rownames中使用帶有字符的rownames,你只需要在使用它們時引用它們。我認為這里的問題是子集函數,它允許一些不尋常的方法來指定使某些事情更容易的子集,但其他更難。它試圖找出rownames的意思(而不是僅僅將它們作為文字字符串),並且括號可能會混淆該過程。

Try something like:

嘗試以下方法:

data[ grepl( paste0("v*[0-9]_",ws_names, collapse="|" ), rownames(data)), ]

You may also be able to simplify this using %in% if you can construct the list of names.

如果可以構造名稱列表,也可以使用%in%來簡化此操作。

Also see fortune(69), the ==TRUE is redundant and slightly less useful than adding 0 or multiplying by 1.

另見fortune(69),== TRUE是多余的,比添加0或乘以1稍微有用。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2013/06/12/7255da300f58768909170d82a3bea1e5.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com