為什么gc()不釋放內存?

[英]Why does gc() not free memory?


I run simulations on a Windows 64bit-computer with 64 GB RAM. Memory use reaches 55% and after a finished simulation run I remove all objects in the working space by rm(list=ls()), followed by a double gc().

我在裝有64gb內存的Windows 64bit電腦上運行模擬。內存使用達到55%,在完成模擬運行后,我使用rm(list=ls())刪除工作空間中的所有對象,然后使用double gc()。

I supposed that this would free enough memory for the next simulation run, but actually memory usage drops by just 1%. Consulting a lot of different fora I could not find a satisfactory explanation, only vague comments such as:

我認為這將為下一次模擬運行釋放足夠的內存,但實際上內存使用量僅下降了1%。查閱了很多不同的論壇,我找不到滿意的解釋,只有模糊的評論,如:

"Depending on your operating system, the freed up memory might not be returned to the operating system, but kept in the process space."

“根據您的操作系統,釋放的內存可能不會返回到操作系統,而是保存在進程空間中。”

I'd like to find information on:

我想了解一下:

  • 1) which OS and under which conditions freed memory is not returned to the OS, and
  • 1)哪些操作系統和哪些條件下釋放的內存沒有返回到操作系統,以及。
  • 2) if there is any other remedy than closing R and start it again for the next simulation run?
  • 2)如果有任何其他補救措施,而不是關閉R,並在下一次模擬運行時重新啟動它?

2 个解决方案

#1


19  

How do you check memory usage? Normally virtual machine allocates some chunk of memory that it uses to store its data. Some of the allocated may be unused and marked as free. What GC does is discovering data that is not referenced from anywhere else and marking corresponding chunks of memory as unused, this does not mean that this memory is released to the OS. Still from the VM perspective there's now more free memory that can be used for further computation.

如何檢查內存使用情況?通常,虛擬機會分配一些用於存儲數據的內存塊。一些分配的可能是未使用和標記為免費的。GC所做的是發現其他地方沒有引用的數據,並將相應的內存塊標記為未使用,這並不意味着將該內存釋放給操作系統。從VM的角度來看,現在有更多的空閑內存可以用於進一步的計算。

As others asked did you experience out of memory errors? If not then there's nothing to worry about.

有人問,你是否有過內存錯誤?如果沒有,那就沒什么好擔心的了。

EDIT: This and this should be enough to understand how memory allocation and garbage collection works in R.

編輯:這個和這個應該足夠理解內存分配和垃圾收集在R中是如何工作的。

From the first document:

從第一個文檔:

Occasionally an attempt is made to release unused pages back to the operating system. When pages are released, a number of free nodes equal to R_MaxKeepFrac times the number of allocated nodes for each class is retained. Pages not needed to meet this requirement are released. An attempt to release pages is made every R_PageReleaseFreq level 1 or level 2 collections.

偶爾會嘗試將未使用的頁面釋放回操作系統。在釋放頁面時,將保留若干個空閑節點,等於R_MaxKeepFrac乘以每個類分配的節點數量。不需要滿足此要求的頁面將被釋放。發布頁面的嘗試是每個R_PageReleaseFreq級別1或2級集合。

EDIT2:

EDIT2:

To see used memory try running gc() with verbose set to TRUE:

要查看已使用的內存,請嘗試運行gc(),將詳細設置為TRUE:

gc(verbose=T)

Here's a result with an array of 10'000'000 integers in memory:

這里有一個內存中有10000個整數的數組:

Garbage collection 9 = 1+0+8 (level 2) ... 
10.7 Mbytes of cons cells used (49%)
40.6 Mbytes of vectors used (72%)
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  198838 10.7     407500 21.8   350000 18.7
Vcells 5311050 40.6    7421749 56.7  5311504 40.6

And here's after discarding reference to it:

這是在放棄對它的引用之后

Garbage collection 10 = 1+0+9 (level 2) ... 
10.7 Mbytes of cons cells used (49%)
2.4 Mbytes of vectors used (5%)
         used (Mb) gc trigger (Mb) max used (Mb)
Ncells 198821 10.7     407500 21.8   350000 18.7
Vcells 310987  2.4    5937399 45.3  5311504 40.6

As you can see memory used by Vcells fell from 40.6Mb to 2.4Mb.

可以看到Vcells使用的內存從40.6Mb下降到2.4Mb。

#2


25  

The R garbage collector is imperfect in the following (not so) subtle way: it does not move objects (i.e., it does not compact memory) because of the way it interacts with C libraries. (Some other languages/implementations suffer from this too, but others, despite also having to interact with C, manage to have a compacting generational GC which does not suffer from this problem).

R垃圾收集器在以下(並非如此)微妙的方式中是不完美的:它不會移動對象(例如。因為它與C庫交互的方式)。(其他一些語言/實現也受到這個問題的影響,但是其他語言/實現盡管也必須與C進行交互,但仍然擁有一個緊湊的分代GC,不會受到這個問題的影響)。

This means that if you take turns allocating small chunks of memory which are then discarded and larger chunks for more permanent objects (this is a common situation when doing string/regexp processing), then your memory becomes fragmented and the garbage collector can do nothing about it: the memory is released, but cannot be re-used because the free chunks are too short.

這意味着如果你輪流分配小塊內存,然后丟棄和較大的塊更持久對象(這是一個常見的情況在字符串/正則表達式處理),然后你的記憶變得支離破碎,垃圾收集器就無計可施:內存被釋放,但不能被重用,因為空閑塊太短。

The only way to fix the problem is to save the objects you want, restart R, and reload the objects.

解決這個問題的唯一方法是保存您想要的對象,重新啟動R並重新加載對象。

Since you are doing rm(list=ls()), i.e., you do not need any objects, you do not need to save and reload anything, so, in your case, the solution is precisely what you want to avoid - restarting R.

因為你在做rm(list=ls()),即,您不需要任何對象,也不需要保存和重載任何東西,因此,在您的示例中,解決方案正是您希望避免的——重新啟動R。

PS. Garbage collection is a highly non-trivial topic. E.g., Ruby used 5 (!) different GC algorithms over 20 years. Java GC does not suck because Sun/Oracle and IBM spent many man-years on the their respective implementations of the GC. On the other hand, R and Python have lousy GC - because no one bothered to invest the necessary man-years - and they are quite popular. That's worse-is-better for you.

垃圾收集是一個非常重要的話題。例如,Ruby在20年中使用了5(!)不同的GC算法。Java GC並不糟糕,因為Sun/Oracle和IBM在各自的GC實現上花費了多年的時間。另一方面,R和Python有糟糕的GC—因為沒有人費心去投資必要的人年—它們非常流行。這是壞的就是好的。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2013/01/29/d1a242c96ed031a57d6cfc13d9ebb4a9.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com