如何在R中創建用戶定義的S4類的數據框

[英]How to create a dataframe of user defined S4 classes in R


I want to create a data.frame of different variables, including S4 classes. For a built-in class like "POSIXlt" (for dates) this works fine:

我想創建一個包含S4類的不同變量的data.frame。對於像“POSIXlt”這樣的內置類(對於日期),這很好用:

as.data.frame(list(id=c(1,2), 
                   date=c(as.POSIXlt('2013-01-01'),as.POSIXlt('2013-01-02'))

But now i have a user defined class, let's say a "Person" class with name and age:

但現在我有一個用戶定義的類,讓我們說一個名字和年齡的“人”類:

setClass("person", representation(name="character", age="numeric"))

But the following fails:

但是以下失敗了:

as.data.frame(list(id=c(1,2), pers=c(new("person", name="John", age=20),
                                     new("person", name="Tom", age=30))))

I also tried to overload the [...]-Operator for the person class using

我還嘗試使用人員類重載[...] - 運算符

setMethod(
  f = "[",
  signature="person",
  definition=function(x,i,j,...,drop=TRUE){ 
    initialize(x, name=x@name[i], age = x@age[i])
  }
)

This allows for vector-like behavior:

這允許類似矢量的行為:

persons = new("person", name=c("John","Tom"), age=c(20,30))
p1 = persons[1]

But still the following fails:

但仍然有以下失敗:

as.data.frame(list(id=c(1,2), pers=persons))

Perhaps I have to overload more operators to get the user defined class into a dataframe? I am sure, there must be a way to do this, as POSIXlt is an S4 class and it works! Any solution using the new R5 reference classes would be also fine!

也許我必須重載更多運算符才能將用戶定義的類放入數據幀中?我相信,必須有辦法做到這一點,因為POSIXlt是一個S4類,它的工作原理!任何使用新R5參考類的解決方案都可以!

I do not want to put all my data into the person class (You could ask, why "id" is not a member of person I just do not use dataframes)! The idea is that my data.frame represents a table from a database with many columns with different types, e.g., strings, numbers,... but also dates, intervals, geo-objects, etc... While for dates I already have a solution (POSIXlt), for intervals, geo-objects, etc. I probably need to specify my own S4/R5 classes.

我不想把我的所有數據都放到人類中(你可以問,為什么“id”不是我不使用數據幀的人的成員)!我的想法是我的data.frame表示一個數據庫中的表,其中包含許多不同類型的列,例如字符串,數字......,還有日期,間隔,地理對象等...而對於我已經擁有的日期一個解決方案(POSIXlt),用於區間,地理對象等。我可能需要指定自己的S4 / R5類。

Thanks a lot in advance.

非常感謝提前。

2 个解决方案

#1


7  

Here's your class, with a "column" interpretation of its definition, rather than row; this will be important for performance; also date for reference

這是你的類,它的定義是“列”解釋,而不是行;這對性能很重要;也可參考日期

setClass("person", representation(name="character", age="numeric"))
pers <- new("person", name=c("John", "Tom"), age=c(20, 30))
date <- as.POSIXct(c('2013-01-01', '2013-01-02'))

Some experimenting, including looking at methods(class="POSIXct") and paying attention to error messages led me to implement as.data.frame.person and format.person (the latter is used for display in a data.frame) as

一些實驗,包括查看方法(class =“POSIXct”)並注意錯誤消息,這使我實現了as.data.frame.person和format.person(后者用於在data.frame中顯示)as

as.data.frame.person <-
    function(x, row.names=NULL, optional=FALSE, ...)
{
    if (is.null(row.names))
        row.names <- x@name
    value <- list(x)
    attr(value, "row.names") <- row.names
    class(value) <- "data.frame"
    value
}

format.person <- function(x, ...) paste0(x@name, ", ", x@age)

This gets me my objects in a data.frame:

這讓我在data.frame中得到了我的對象:

> lst <- list(id=1:2, date=date, pers=pers)
> as.data.frame(lst)
     id       date     pers
John  1 2013-01-01 John, 20
Tom   2 2013-01-02  Tom, 30

If I want to subset, then I need

如果我想要子集,那么我需要

setMethod("[", "person", function(x, i, j, ..., drop=TRUE) {
    initialize(x, name=x@name[i], age=x@age[i])
})

I'm not sure what other methods might be required as more data.frame operations are encountered, there is no "data.frame interface".

我不確定可能需要哪些其他方法,因為遇到更多的data.frame操作,沒有“data.frame接口”。

Using the vectorized class in data.table seems to require a length method for construction.

在data.table中使用向量化類似乎需要一個長度方法來構造。

> library(data.table)
> data.table(id=1:2, pers=pers)
Error in data.table(id = 1:2, pers = pers) : 
  problem recycling column 2, try a simpler type
> setMethod(length, "person", function(x) length(x@name))
[1] "length"
> data.table(id=1:2, pers=pers)
   id     pers
1:  1 John, 20
2:  2  Tom, 30

Maybe there's a data.table interface?

也許有一個data.table接口?

#2


2  

Judging by this thread on the mailing list:

通過郵件列表上的這個帖子來判斷:

http://tolstoy.newcastle.edu.au/R/e2/devel/06/11/1013.html

...John Chambers was thinking about this in 2006. And still we can't put S4 objects in columns of data frames. We also can't put complex S3 classes in columns of data frames neither.

... John Chambers在2006年考慮過這個問題。我們仍然不能將S4對象放在數據框的列中。我們也不能將復雜的S3類放在數據幀列中。

There are some other tabular data structures that might do it - data.table perhaps:

還有一些其他表格數據結構可能會這樣做 - data.table也許:

require(data.table)
setClass("geezer", representation(name="character", age="numeric"))
tom=new("geezer",name="Tom",age=20)
dick=new("geezer",name="Dick",age=23)
harry=new("geezer",name="Harry",age=25)
gt = data.table(geezers=c(tom,dick,harry),weapons=c("Gun","Gun","Knife"))
gt
    geezers weapons
1: <geezer>     Gun
2: <geezer>     Gun
3: <geezer>   Knife

The semantics of data.table are a bit different to data.frame, and don't expect to be able to plug a data.table into any code that uses a data.frame and expect it to work (For example, I suspect lm and glm will go wobbly). But it seems the data.table authors allow compound classes in columns...

data.table的語義與data.frame略有不同,並且不希望能夠將data.table插入任何使用data.frame的代碼並期望它能夠工作(例如,我懷疑是lm而且glm會搖擺不定。但似乎data.table作者允許列中的復合類...


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2013/01/30/7299831470236e3b19e7f08140984d00.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com