execv()和fork()的時間浪費

[英]Time waste of execv() and fork()


I am currently learning about fork() and execv() and I had a question regarding the efficiency of the combination.

我目前正在學習fork()和execv(),我有一個關於組合效率的問題。

I was shown the following standard code:

我被顯示了以下標准代碼:

pid = fork();
if(pid < 0){
    //handle fork error
}
else if (pid == 0){
    execv("son_prog", argv_son);
//do father code

I know that fork() clones the entire process (copying the entire heap, etc) and that execv() replaces the current address space with that of the new program. With this in mind, doesn't it make it very inefficient to use this combination? We are copying the entire address space of a process and then immediately overwrite it.

我知道fork()克隆整個進程(復制整個堆等),而execv()用新程序的地址空間替換當前地址空間。考慮到這一點,使用這種組合不是效率很低嗎?我們復制進程的整個地址空間,然后立即覆蓋它。

So my question:
What is the advantage that is achieved by using this combo (instead of some other solution) that makes people still use this, even though we have waste?

因此,我的問題是:使用這個組合(而不是其他的解決方案)使人們仍然使用它的好處是什么?

5 个解决方案

#1


43  

What is the advantage that is achieved by using this combo (instead of some other solution) that makes people still use this even though we have waste?

使用這個組合(而不是其他的解決方案)使人們即使有浪費也仍然使用它的好處是什么?

You have to create a new process somehow. There are very few ways for a userspace program to accomplish that. POSIX used to have vfork() alognside fork(), and some systems may have their own mechanisms, such as Linux-specific clone(), but since 2008, POSIX specifies only fork() and the posix_spawn() family. The fork + exec route is more traditional, is well understood, and has few drawbacks (see below). The posix_spawn family is designed as a special purpose substitute for use in contexts that present difficulties for fork(); you can find details in the "Rationale" section of its specification.

你必須以某種方式創造一個新的過程。用戶空間程序很少有辦法實現這一點。POSIX過去有vfork() alognside fork(),一些系統可能有自己的機制,比如特定於linux的克隆(),但是自2008年以來,POSIX只指定fork()和posix_spawn()家族。fork + exec路由更傳統,易於理解,並且沒有什么缺點(參見下面)。posix_spawn家族被設計成在出現fork()困難的上下文中用於特殊目的的替代;您可以在其規范的“基本原理”部分找到詳細信息。

This excerpt from the Linux man page for vfork() may be illuminating:

vfork()的Linux手冊頁摘錄可能具有啟發性:

Under Linux, fork(2) is implemented using copy-on-write pages, so the only penalty incurred by fork(2) is the time and memory required to duplicate the parent’s page tables, and to create a unique task structure for the child. However, in the bad old days a fork(2) would require making a complete copy of the caller’s data space, often needlessly, since usually immediately afterwards an exec(3) is done. Thus, for greater efficiency, BSD introduced the vfork() system call, which did not fully copy the address space of the parent process, but borrowed the parent’s memory and thread of control until a call to execve(2) or an exit occurred. The parent process was suspended while the child was using its resources. The use of vfork() was tricky: for example, not modifying data in the parent process depended on knowing which variables are held in a register.

在Linux下,fork(2)是使用寫時復制的頁面實現的,因此fork(2)所付出的唯一代價是復制父表所需的時間和內存,以及為子表創建惟一的任務結構。然而,在過去糟糕的日子里,fork(2)將需要對調用者的數據空間進行完整的復制,這通常是不必要的,因為通常在執行之后會立即執行一個exec(3)。因此,為了提高效率,BSD引入了vfork()系統調用,它沒有完全復制父進程的地址空間,而是借用父進程的內存和控制線程,直到調用execve(2)或發生出口。當子進程使用它的資源時,父進程被掛起。vfork()的使用很棘手:例如,在父進程中不修改數據取決於知道在寄存器中保存哪些變量。

(Emphasis added)

(重點)

Thus, your concern about waste is not well-founded for modern systems (not limited to Linux), but it was indeed an issue historically, and there were indeed mechanisms designed to avoid it. These days, most of those mechanisms are obsolete.

因此,您對浪費的關注對於現代系統(不僅限於Linux)來說並不是完全有根據的,但這確實是一個歷史問題,而且確實有一些機制設計來避免浪費。如今,這些機制大多已經過時。

#2


23  

Another answer states:

另一個答案:

However, in the bad old days a fork(2) would require making a complete copy of the caller’s data space, often needlessly, since usually immediately afterwards an exec(3) is done.

Obviously, one person's bad old days are a lot younger than others remember.

顯然,一個人糟糕的過去要比別人記憶中的年輕得多。

The original UNIX systems did not have the memory for running multiple processes and they did not have an MMU for keeping several processes in physical memory ready-to-run at the same logical address space: they swapped out processes to disk that it wasn't currently running.

最初的UNIX系統沒有用於運行多個進程的內存,也沒有用於在物理內存中保持多個進程在相同的邏輯地址空間中隨時可運行的MMU:它們將進程交換到磁盤,而磁盤當前沒有運行這些進程。

The fork system call was almost entirely the same as swapping out the current process to disk, except for the return value and for not replacing the remaining in-memory copy by swapping in another process. Since you had to swap out the parent process anyway in order to run the child, fork+exec was not incurring any overhead.

fork系統調用與將當前進程交換到磁盤幾乎完全相同,除了返回值和不通過在另一個進程中交換來替換其余內存中的副本。由於您必須將父進程替換為運行子進程,fork+exec不會產生任何開銷。

It's true that there was a period of time when fork+exec was awkward: when there were MMUs that provided a mapping between logical and physical address space but page faults did not retain enough information that copy-on-write and a number of other virtual-memory/demand-paging schemes were feasible.

確實有一段時間fork+exec非常笨拙:當時有MMUs在邏輯地址空間和物理地址空間之間提供映射,但頁面錯誤沒有保留足夠的信息,因此寫時復制和其他一些虛擬內存/需求-分頁方案是可行的。

This situation was painful enough, not just for UNIX, that page fault handling of the hardware was adapted to become "replayable" pretty fast.

這種情況非常痛苦,不僅對UNIX來說,對硬件的頁面錯誤處理也適應了很快的“可重新播放”。

#3


21  

Not any longer. There's something called COW (Copy On Write), only when one of the two processes (Parent/Child) tries to write to a shared data, it is copied.

不再。有一種東西叫做COW(寫時復制),只有當兩個進程中的一個(父進程/子進程)試圖寫入共享數據時,它才會被復制。

In the past:
The fork() system call copied the address space of the calling process (the parent) to create a new process (the child). The copying of the parent's address space into the child was the most expensive part of the fork() operation.

在過去:fork()系統調用復制了調用進程(父進程)的地址空間,以創建一個新的進程(子進程)。將父地址空間復制到子地址空間是fork()操作中最昂貴的部分。

Now:
A call to fork() is frequently followed almost immediately by a call to exec() in the child process, which replaces the child's memory with a new program. This is what the the shell typically does, for example. In this case, the time spent copying the parent's address space is largely wasted, because the child process will use very little of its memory before calling exec().

現在:對fork()的調用通常會緊接着在子進程中調用exec(),這將用一個新程序替換子進程的內存。例如,這就是shell的典型功能。在這種情況下,花費在復制父進程地址空間上的時間很大程度上被浪費了,因為子進程在調用exec()之前將很少用到它的內存。

For this reason, later versions of Unix took advantage of virtual memory hardware to allow the parent and child to share the memory mapped into their respective address spaces until one of the processes actually modifies it. This technique is known as copy-on-write. To do this, on fork() the kernel would copy the address space mappings from the parent to the child instead of the contents of the mapped pages, and at the same time mark the now-shared pages read-only. When one of the two processes tries to write to one of these shared pages, the process takes a page fault. At this point, the Unix kernel realizes that the page was really a "virtual" or "copy-on-write" copy, and so it makes a new, private, writable copy of the page for the faulting process. In this way, the contents of individual pages aren't actually copied until they are actually written to. This optimization makes a fork() followed by an exec() in the child much cheaper: the child will probably only need to copy one page (the current page of its stack) before it calls exec().

由於這個原因,Unix的后續版本利用虛擬內存硬件,允許父進程和子進程共享映射到各自地址空間的內存,直到其中一個進程實際修改它為止。這種技術稱為寫時復制。為此,在fork()上,內核將把地址空間映射從父頁面復制到子頁面,而不是映射頁面的內容,同時將現在共享的頁面標記為只讀。當兩個進程中的一個試圖寫入其中一個共享頁面時,該進程會出現頁面錯誤。在這一點上,Unix內核意識到頁面實際上是一個“虛擬”或“復制-寫”的副本,因此它為錯誤進程創建了一個新的、私有的、可寫的頁面副本。通過這種方式,在實際寫入各個頁面之前,不會實際復制各個頁面的內容。這種優化使fork()后面跟着exec()更加便宜:在調用exec()之前,子節點可能只需要復制一頁(其堆棧的當前頁)。

#4


2  

It turns out all those COW page faults are not at all cheap when the process has a few gigabytes of writeable RAM. They're all gonna fault once even if the child has long since called exec(). Because the child of fork() is no longer allowed to allocate memory even for the single threaded case (you can thank apple for that one), arranging to call vfork()/exec() instead is hardly more difficult now.

事實證明,當進程有幾千兆字節的可寫RAM時,所有這些牛頁面錯誤都不便宜。即使孩子很久以前就調用了exec(),它們都會出錯一次。因為fork()的子節點不再被允許為單線程情況分配內存(您可以為此感謝apple),所以現在安排調用vfork()/exec()幾乎不再困難。

The real advantage to the vfork()/exec() model is you can set the child up with an arbitrary current directory, arbitrary environment variables, and arbitrary fs handles (not just stdin/stdout/stderr), an arbitrary signal mask, and some arbitrary shared memory (using the shared memory syscalls) without having a twenty-argument CreateProcess() API that gets a few more arguments every few years.

真正的優勢vfork()/ exec()模型可以用任意設置孩子當前目錄,任意環境變量,和任意fs處理(不僅僅是stdin、stdout和stderr),任意信號掩碼,和任意共享內存(使用共享內存系統調用)沒有twenty-argument CreateProcess()API,每隔幾年就得到更多的參數。

It turned out the "oops I leaked handles being opened by another thread" gaffe from the early days of threading was fixable in userspace w/o process-wide locking thanks to /proc. The same would not be in the giant CreateProcess() model without a new OS version, and convincing everybody to call the new API.

原來,在線程早期,“哎呦,我漏了句柄被另一個線程打開”的錯誤在userspace w/o進程范圍的鎖定中是可以修復的,這多虧了/proc。在巨大的CreateProcess()模型中,如果沒有一個新的OS版本,並且說服每個人調用新的API,就不會出現這種情況。

So there you have it. An accident of design ended up far better than the directly designed solution.

這就是結果。一次設計事故的結果比直接設計的解決方案要好得多。

#5


1  

A process created by exec() et al, will inherit its file handles from the parent process (including stdin, stdout, stderr). If the parent changes these after calling fork() but before calling exec() then it can control the child's standard streams.

exec()等人創建的進程將從父進程(包括stdin、stdout、stderr)繼承其文件句柄。如果父類在調用fork()之后更改這些內容,但是在調用exec()之前,它可以控制子的標准流。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2016/10/05/72a16ff7f520168bd72e1d10976e7dee.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com