如何防止功能污染全局命名空間?

[英]How to prevent functions polluting global namespace?


My R project is getting increasingly complex, and I'm starting to look for some construct that's equivalent to classes in Java/C#, or modules in python, so that my global namespace doesn't become littered with functions that are never used outside of one particular .r file.

我的R項目變得越來越復雜,我開始尋找一些與Java / C#中的類或python中的模塊等效的構造,這樣我的全局命名空間就不會被那些從未在外面使用的函數所占用。一個特定的.r文件。

So, I guess my question is: to what extent is it possible to limit the scope of functions to within a specific .r file, or similar?

所以,我想我的問題是:在多大程度上可以將函數的范圍限制在特定的.r文件或類似文件中?

I think I can just make the entire .r file into one giant function, and put functions inside that, but that messes with the echoing:

我想我可以將整個.r文件放到一個巨大的函數中,並將函數放在其中,但這會讓回聲變得混亂:

myfile.r:

myfile.r:

myfile <- function() {
    somefunction <- function(a,b,c){}
    anotherfunction <- function(a,b,c){}

    # do some stuff here...
    123
    456
    # ...
}
myfile()

Output:

輸出:

> source("myfile.r",echo=T)

> myfile <- function() {
+     somefunction <- function(a,b,c){}
+     anotherfunction <- function(a,b,c){}
+ 
+     # do some stuff here...
+     # . .... [TRUNCATED] 

> myfile()
> 

You can see that "123" is not printed, even though we used echo=T in the source command.

您可以看到“123”未打印,即使我們在source命令中使用了echo = T.

I'm wondering if there is some other construct which is more standard, since putting everything inside a single function doesn't sound like something that is really standard? But perhaps it is? Also, if it means that echo=T works then that is a definite bonus for me.

我想知道是否有一些更標准的其他構造,因為將所有內容放在單個函數中聽起來不像真正標准的東西?但也許是這樣?此外,如果它意味着echo = T的作用,那么這對我來說是一個明確的獎勵。

3 个解决方案

#1


18  

Firstly, as @Spacedman has said, you'll be best served by a package but there are other options.

首先,正如@Spacedman所說,你將獲得最佳服務,但還有其他選擇。

S3 Methods

R's original "object orientation" is known as S3. The majority of R's code base uses this particular paradigm. It is what makes plot() work for all kinds of objects. plot() is a generic function and the R Core Team and package developers can and have written their own methods for plot(). Strictly these methods might have names like plot.foo() where foo is a class of object for which the function defines a plot() method. The beauty of S3 is that you don't (hardly) ever need to know or call plot.foo() you just use plot(bar) and R works out which plot() method to dispatch to based on the class of object bar.

R的原始“面向對象”被稱為S3。 R的大多數代碼庫都使用這種特殊的范例。這是使plot()適用於各種對象的原因。 plot()是一個通用函數,R核心團隊和包開發人員可以編寫自己的plot()方法。嚴格地說,這些方法可能具有類似plot.foo()的名稱,其中foo是一個對象類,函數定義了plot()方法。 S3的美妙之處在於你沒有(幾乎)不需要知道或調用plot.foo()你只是使用plot(bar)和R計算出哪個plot()方法基於對象欄的類調度。

In your comments on your Question you mention that you have a function populate() that has methods (in effect) for classes "crossvalidate" and "prod" which you keep in separate .r files. The S3 way to set this up is to do:

在您對您的問題的評論中,您提到您有一個函數populate(),其中包含“crossvalidate”和“prod”類的方法(實際上),這些類保存在單獨的.r文件中。設置它的S3方法是:

populate <- function(x, ...) { ## add whatever args you want/need
    UseMethod("populate")
}

populate.crossvalidate <-
    function(x, y, z, ...) { ## add args but must those of generic
    ## function code here
}

populate.prod <-
    function(x, y, z, ...) { ## add args but must have those of generic
    ## function code here
}

The given some object bar with class "prod", calling

給定一些帶有“prod”類的對象欄,調用

populate(bar)

will result in R calling populate() (the generic), it then looks for a function with name populate.prod because that is the class of bar. It finds our populate.prod() and so dispatches that function passing on to it the arguments we initially specified.

將導致R調用populate()(泛型),然后查找名為populate.prod的函數,因為這是bar的類。它找到我們的populate.prod(),然后調度該函數將我們最初指定的參數傳遞給它。

So you see that you only ever refer to the methods using the name of the generic, not the full function name. R works out for you what method needs to be called.

所以你看到你只使用泛型名稱而不是完整的函數名稱來引用方法。 R為您解決需要調用的方法。

The two populate() methods can have very different arguments, with exception that strictly they should have the same arguments as the generic function. So in the example above, all methods should have arguments x and .... (There is an exception for methods that employ formula objects but we don't need to worry about that here.)

兩個populate()方法可以有非常不同的參數,但是嚴格來說它們應該與泛型函數具有相同的參數。所以在上面的例子中,所有方法都應該有參數x和....(使用公式對象的方法有一個例外,但我們不需要在這里擔心。)

Package Namespaces

Since R 2.14.0, all R packages have had their own namespace, even if one were not provided by the package author, although namespaces have been around for a lot longer in R than that.

從R 2.14.0開始,所有R包都有自己的命名空間,即使包名作者沒有提供一個命名空間,盡管命名空間在R中已經存在了很長時間。

In your example, we wish to register the populate() generic and it's two methods with the S3 system. We also wish to export the generic function. Usually we don't want or need to export the individual methods. So, pop your functions into .R files in the R folder of the package sources and then in the top level of the package sources create a file named NAMESPACE and add the following statements:

在您的示例中,我們希望注冊populate()泛型,它是S3系統的兩個方法。我們還希望導出泛型函數。通常我們不希望或不需要導出單個方法。因此,將函數彈出到包源的R文件夾中的.R文件中,然后在包源的頂層創建一個名為NAMESPACE的文件並添加以下語句:

export(populate) ## export generic

S3method(populate, crossvalidate) ## register methods
S3method(populate, prod)

Then once you have installed your package, you will note that you can call populate() but R will complain if you try to call populate.prod() etc directly by name from the prompt or in another function. This is because the functions that are the individual methods have not been exported from the namespace and thence are not visible outside it. Any function in your package that call populate() will be able to access the methods you have defined, but any functions or code outside your package can't see the methods at all. If you want, you can call non-exported functions using the ::: operator, i.e.

然后,一旦你安裝了你的軟件包,你會注意到你可以調用populate()但是如果你試圖通過提示或其他函數中的名字直接調用populate.prod()等,R會抱怨。這是因為單個方法的函數尚未從命名空間導出,因此在其外部不可見。你的包中調用populate()的任何函數都可以訪問你定義的方法,但是包外的任何函數或代碼根本看不到這些方法。如果需要,可以使用:::運算符調用非導出函數,即

mypkg:::populate.crossvalidate(foo, bar)

will work, where mypkg is the name of your package.

將工作,其中mypkg是您的包的名稱。

To be honest, you don't even need a NAMESPACE file as R will auto generate one when you install the package, one that automatically exports all functions. That way your two methods will be visible as populate.xxx() (where xxx is the particular method) and will operate as S3 methods.

說實話,你甚至不需要NAMESPACE文件,因為R會在你安裝軟件包時自動生成一個文件,自動導出所有功能。這樣你的兩個方法將作為populate.xxx()(其中xxx是特定方法)可見,並將作為S3方法運行。

Read Section 1 Creating R Packages in the Writing R Extensions manual for details of what is involved, but yuo won't need to do half of this if you don't want too, especially if the package is for your own use. Just create the appropriate package folders (i.e. R and man), stick your .R files in R. Write a single .Rd file in man where you add

請閱讀“編寫R擴展”手冊中的第1節“創建R包”以獲取所涉及的內容的詳細信息,但如果您不需要,則yuo不需要執行此操作的一半,尤其是如果包是供您自己使用的話。只需創建適當的包文件夾(即R和man),將你的.R文件粘貼在R中。在你添加的man中寫一個.Rd文件

\name{Misc Functions}
\alias{populate}
\alias{populate.crossvalidate}
\alias{populate.prod}

at the top of the file. Add \alias{} for any other functions you have. Then you'll need to build and install the package.

在文件的頂部。為您擁有的任何其他功能添加\ alias {}。然后你需要構建和安裝包。

Alternative using sys.source()

Although I don't (can't!) really recommend what I mention below as a long-term viable option here, there is an alternative that will allow you to isolate the functions from individual .r files as you initially requested. This is achieved through the use of environments not namespaces and doesn't involve creating a package.

雖然我沒有(不能!)真的推薦我在下面提到的作為長期可行的選項,但是有一個替代方案可以讓你根據最初的要求將函數與各個.r文件隔離開來。這是通過使用非命名空間的環境來實現的,並且不涉及創建包。

The sys.source() function can be used to source R code/functions from a .R file and evaluate it in an environment. As you .R file is creating/defining functions, if you source it inside another environment then those will functions will be defined there, in that environment. They won't be visible on the standard search path by default and hence a populate() function defined in crossvalidate.R will not clash with a populate() defined in prod.R as long as you use two separate environments. When you need to use one set of functions you can assign the environment to the search path, upon which it will then be miraculously visible to everything, and when you are done you can detach it. The attach the other environment, use it, detach etc. Or you can arrange for R code to be evaluated in a specific environment using things like eval().

sys.source()函數可用於從.R文件中獲取R代碼/函數,並在環境中對其進行評估。正如您.R文件正在創建/定義函數,如果您在另一個環境中獲取它,那么將在那個環境中定義那些函數。默認情況下,它們在標准搜索路徑中不可見,因此只要您使用兩個單獨的環境,crossvalidate.R中定義的populate()函數就不會與prod.R中定義的populate()沖突。當您需要使用一組功能時,您可以將環境分配給搜索路徑,然后在其中奇跡般地可以看到所有內容,並且當您完成后,您可以將其分離。附加其他環境,使用它,分離等。或者您可以使用eval()之類的東西安排在特定環境中評估R代碼。

Like I said, this isn't a recommended solution but it will work, after a fashion, in the manner you describe. For example

就像我說的那樣,這不是一個推薦的解決方案,但它會以一種時尚方式,以您描述的方式工作。例如

## two source files that both define the same function
writeLines("populate <- function(x) 1:10", con = "crossvalidate.R")
writeLines("populate <- function(x) letters[1:10]", con = "prod.R")

## create two environments
crossvalidate <- new.env()
prod <- new.env()

## source the .R files into their respective environments
sys.source("crossvalidate.R", envir = crossvalidate)
sys.source("prod.R", envir = prod)

## show that there are no populates find-able on the search path

> ls()
[1] "crossvalidate" "prod" 
> find("populate")
character(0)

Now, attach one of the environments and call populate():

現在,附加其中一個環境並調用populate():

> attach(crossvalidate)
> populate()
 [1]  1  2  3  4  5  6  7  8  9 10
> detach(crossvalidate)

Now call the function in the other environment

現在在其他環境中調用該函數

> attach(prod)
> populate()
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
> detach(prod)

Clearly, each time you want to use a particular function, you need to attach() its environment and then call it, followed by a detach() call. Which is a pain.

顯然,每次要使用特定函數時,都需要附加()其環境然后調用它,然后調用detach()。這是一種痛苦。

I did say you can arrange for R code (expressions really) to be evaluated in a stated environment. You can use eval() of with() for this for example.

我確實說過你可以在規定的環境中安排R代碼(表達式)。例如,您可以使用with()的eval()。

> with(crossvalidate, populate())
[1]  1  2  3  4  5  6  7  8  9 10

At least now you only need a single call to run the version of populate() of your choice. However, if calling the functions by their full name, e.g. populate.crossvalidate() is too much effort (as per your comments) then I dare say that even the with() idea will be too much hassle? And anyway, why would you use this when you can quite easily have your own R package.

至少現在你只需要一次調用來運行你選擇的populate()版本。但是,如果按其全名調用函數,例如populate.crossvalidate()是太多的努力(根據你的評論)然后我敢說即使with()的想法會太麻煩?無論如何,當你可以很容易地擁有自己的R包時,為什么要使用它呢?

#2


12  

Don't worry about the complexity of 'making a package'. Stop thinking of it like that. What you are going to do is this:

不要擔心“制作包裝”的復雜性。不要那樣想。你要做的是:

  1. in the folder where you are working on your project, make a folder called 'R'
  2. 在您正在處理項目的文件夾中,創建一個名為“R”的文件夾
  3. put your R code in there, one function per file
  4. 把你的R代碼放在那里,每個文件一個函數
  5. make a DESCRIPTION file in your project directory. Check out existing examples for the exact format, but you only need a few fields.
  6. 在項目目錄中創建一個DESCRIPTION文件。查看具體格式的現有示例,但您只需要幾個字段。
  7. Get devtools. install.packages("devtools")
  8. 得到devtools。 install.packages(注明 “DevTools”)
  9. Use devtools. library(devtools)
  10. 使用devtools。庫(devtools)

Now, write your functions in your R files in your R folder. To load them into R, DONT source them. Do load_all(). Your functions will be loaded but NOT into the global environment.

現在,將您的函數寫入R文件夾中的R文件中。要將它們加載到R中,DONT來源它們。執行load_all()。您的功能將被加載但不會加載到全局環境中。

Edit one of your R files, then do load_all() again. This will load any modified files in the R folder, thus updating your function.

編輯一個R文件,然后再次執行load_all()。這將加載R文件夾中的任何已修改文件,從而更新您的功能。

That's it. Edit, load_all, rinse and repeat. You have created a package, but its pretty lightweight and you don't have to deal with the bondage and discipline of R's package building tools.

而已。編輯,load_all,沖洗並重復。你已經創建了一個包,但它非常輕量級,你不必處理R的包構建工具的束縛和紀律。

I've seen, used, and even written code that tries to implement a lightweight packagey mechanism for loading objects, and none are as good as what devtools does.

我已經看過,使用過,甚至編寫了代碼,試圖實現一個輕量級的包裝機制來加載對象,沒有一個能像devtools那樣好。

All Hail Hadley!

所有冰雹哈德利!

#3


3  

You might want to consider making a package. As an alternative, you could look at environments. Finally, RStudio's projects may be closer to what would suit you.

你可能想考慮制作一個包。作為替代方案,您可以查看環境。最后,RStudio的項目可能更適合您。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2012/10/26/73008481a24fb91cb44afd3e8a77bdc6.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com