union 'punning' structs w/ "通用初始序列":為什么C(99+),而不是c++,規定了“可見的聯合類型聲明”?

[英]union 'punning' structs w/ “common initial sequence”: Why does C (99+), but not C++, stipulate a 'visible declaration of the union type'?


Background

Discussions on the mostly un-or-implementation-defined nature of type-punning via a union typically quote the following bits, here via @ecatmur ( https://stackoverflow.com/a/31557852/2757035 ), on an exemption for standard-layout structs having a "common initial sequence" of member types:

通過一個典型的“聯合”,通過一個典型的例子,通過@ecatmur (https://stackoverflow.com/a/31557852/2757035),討論在“通用的初始序列”中有一個成員類型的“通用初始序列”。

C11 (6.5.2.3 Structure and union members; Semantics):

C11(6.5.2.3結構和工會成員;語義):

[...] if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

[…)如果一個聯盟包含幾個結構,共享一個共同的初始序列(見下文),如果聯盟對象目前包含其中的一個結構,它允許檢查其中的任何一個地方的常見的初始部分,完成的聲明類型的工會是可見的。如果對應的成員具有一個或多個初始成員的序列,那么兩個結構共享一個共同的初始序列,如果相應的成員具有兼容的類型(並且對於位域,相同的寬度)。

C++03 ([class.mem]/16):

c++ 03([class.mem]/ 16):

If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD-union object currently contains one of these POD-structs, it is permitted to inspect the common initial part of any of them. Two POD-structs share a common initial sequence if corresponding members have layout-compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

如果一個POD-union包含兩個或多個具有相同初始序列的pod -struct,並且如果POD-union對象當前包含其中一個pod -struct,則可以檢查它們中的任何一個初始部分。兩個pod結構共享一個共同的初始序列,如果相應的成員擁有一個或多個初始成員序列的layout兼容類型(並且,對於位域,相同的寬度)。

Other versions of the two standards have similar language; since C++11 the terminology used is standard-layout rather than POD.

這兩個標准的其他版本有相似的語言;由於c++ 11所使用的術語是標准布局而不是POD。

Since no reinterpretation is required, this isn't really type-punning, just name substitution applied to union member accesses. A proposal for C++17 (the infamous P0137R1) makes this explicit using language like 'the access is as if the other struct member was nominated'.

由於不需要重新解釋,所以這並不是真正的類型轉換,只是將名稱替換應用於union成員訪問。對於c++ 17(臭名昭著的P0137R1)的提議,使用“訪問就像其他結構成員被提名一樣”這樣的語言是很明確的。

But please note the bold - "anywhere that a declaration of the completed type of the union is visible" - a clause that exists in C11 but nowhere in C++ drafts for 2003, 2011, or 2014 (all nearly identical, but later versions replace "POD" with the new term standard layout). In any case, the 'visible declaration of union type bit is totally absent in the corresponding section of any C++ standard.

但是請注意,“在任何地方都可以看到完整類型的聯合聲明”——在C11中存在的一個條款,但是在2003、2011或2014年的c++草案中沒有任何條款(幾乎是相同的,但是后來的版本將“POD”替換為新的術語標准布局)。無論如何,在任何c++標准的對應部分中,“union類型的可見聲明”完全沒有。

@loop and @Mints97, here - https://stackoverflow.com/a/28528989/2757035 - show that this line was also absent in C89, first appearing in C99 and remaining in C since then (though, again, never filtering through to C++).

@loop和@Mints97,這里- https://stackoverflow.com/a/28528989/2757035 -顯示這一行在C89中也沒有出現,第一次出現在C99中,然后在C中保留了(不過,再一次,從沒有過濾到c++)。

Standards discussions around this

[snipped - see my answer]

[剪斷-看我的答案]

Questions

From this, then, my questions were:

由此,我的問題是:

  • What does this mean? What is classed as a 'visible declaration'? Was this clause intended to narrow down - or expand up - the range of contexts in which such 'punning' has defined behaviour?

    這是什么意思?什么是“可見聲明”?這一條款意在縮小或擴大這種“雙關語”定義行為的范圍嗎?

  • Are we to assume that this omission in C++ is very deliberate?

    我們是否可以假設在c++中這個省略是很慎重的?

  • What is the reason for C++ differing from C? Did C++ just 'inherit' this from C89 and then either decide - or worse, forget - to update alongside C99?

    c++不同於C的原因是什么?c++只是從C89“繼承”了這個,然后決定——或者更糟,忘記——與C99一起更新?

  • If the difference is intentional, then what benefits or drawbacks are there to the 2 different treatments in C vs C++?

    如果差異是有意的,那么在C和c++中對兩種不同的治療有什么好處或壞處呢?

  • What, if any, interesting ramifications does it have at compile- or runtime? For example, @ecatmur, in a comment replying to my pointing this out on his original answer (link as above), speculated as follows.

    在編譯或運行時,它有什么有趣的分支嗎?例如,@ecatmur (@ecatmur),在一個回復我的評論的評論中,他的原始答案(鏈接如上),推測如下。

I'd imagine it permits more aggressive optimization; C can assume that function arguments S* s and T* t do not alias even if they share a common initial sequence as long as no union { S; T; } is in view, while C++ can make that assumption only at link time. Might be worth asking a separate question about that difference.

我認為它允許更積極的優化;C可以假設函數參數S* S和T* T,即使它們共享一個相同的初始序列,只要它們沒有union {S;T;}在視圖中,而c++可以只在鏈接時做這個假設。也許值得再問一個關於這個區別的問題。

Well, here I am, asking! I'm very interested in any thoughts about this, especially: other relevant parts of the (either) Standard, quotes from committee members or other esteemed commentators, insights from developers who might have noticed a practical difference due to this - assuming any compiler even bothers to enforce C's added clause - and etc. The aim is to generate a useful catalogue of relevant facts about this C clause and its (intentional or not) omission from C++. So, let's go!

嗯,我在這里,問!任何想法我非常感興趣,尤其是:其他相關部分()標准,引用委員會成員或其他受人尊敬的評論員,從開發人員可能已經注意到一個實際的差異由於——假設任何編譯器甚至困擾執行C添加條款——等。目的是生成一個有用的目錄關於這個C條款及其相關的事實(故意)從c++省略。所以,我們走吧!

2 个解决方案

#1


13  

I've found my way through the labyrinth to some great sources on this, and I think I've got a pretty comprehensive summary of it. I'm posting this as an answer because it seems to explain both the (IMO very misguided) intention of the C clause and the fact that C++ does not inherit it. This will evolve over time if I discover further supporting material or the situation changes.

我在迷宮中找到了一些很好的方法,我想我已經有了一個很全面的總結。我將此作為一個答案,因為它似乎解釋了C條款的(IMO非常錯誤的)意圖,以及c++不繼承它的事實。隨着時間的推移,如果我發現了更多的支持材料或者情況發生了變化,這將會發生變化。

This is my first time trying to sum up a very complex situation, which seems ill-defined even to many language architects, so I'll welcome clarifications/suggestions on how to improve this answer - or simply a better answer if anyone has one.

這是我第一次嘗試總結一個非常復雜的情況,甚至對許多語言架構師來說,這似乎都是不明確的,所以我將歡迎澄清/提出關於如何改進這個答案的建議——或者如果有人有的話,最好的答案。

Finally, some concrete commentary

Through vaguely related threads, I found the following answer by @tab - and much appreciated the contained links to (illuminating, if not conclusive) GCC and Working Group defect reports: answer by tab on StackOverflow

通過模糊的相關線程,我找到了@tab的以下答案——並且非常感謝那些包含了(如果不是最終的)GCC和工作組缺陷報告的包含的鏈接:在StackOverflow上的選項卡上的答案。

The GCC link contains some interesting discussion and reveals a sizeable amount of confusion and conflicting interpretations on part of the Committee and compiler vendors - surrounding the subject of union member structs, punning, and aliasing in both C and C++.

GCC鏈接包含了一些有趣的討論,並揭示了委員會和編譯器供應商的大量混亂和相互矛盾的解釋——圍繞union member structs, punning,以及在C和c++中混疊的主題。

At the end of that, we're linked to the main event - another BugZilla thread, Bug 65892, containing an extremely useful discussion. In particular, we find our way to the first of two pivotal documents:

在最后,我們鏈接到主事件——另一個BugZilla線程,Bug 65892,包含一個非常有用的討論。特別是,我們找到了兩個關鍵文件中的第一個:

Origin of the added line in C99

C proposal N685 is the origin of the added clause regarding visibility of a union type declaration. Through what some claim (see GCC thread #2) is a total misinterpretation of the "common initial sequence" allowance, N685 was indeed intended to allow relaxation of aliasing rules for "common initial sequence" structs within a TU aware of some union containing instances of said struct types, as we can see from this quote:

C提案N685是關於聯合聲明的可見性附加條款的來源。通過一些人聲稱(見GCC線程# 2)是一個“共同初始序列”的總誤解津貼,N685的確是打算讓放松的混疊規則“共同初始序列”結構在你意識到一些工會包含的實例表示的結構類型,從這句話我們可以看出:

The proposed solution is to require that a union declaration be visible if aliases through a common initial sequence (like the above) are possible. Therefore the following TU provides this kind of aliasing if desired:

建議的解決方案是,如果可以通過一個通用的初始序列(如上面的),就可以看到一個聯合聲明。因此,下面的圖提供了這樣的別名:

union utag {
    struct tag1 { int m1; double d2; } st1;
    struct tag2 { int m1; char c2; } st2;
};

int similar_func(struct tag1 *pst2, struct tag2 *pst3) {
     pst2->m1 = 2;
     pst3->m1 = 0;   /* might be an alias for pst2->m1 */
     return pst2->m1;
}

Judging by the GCC discussion and comments below such as @ecatmur's, this proposal - which seems to mandate speculatively allowing aliasing for any struct type that has some instance within some union visible to this TU - seems to have received great derision and rarely been implemented.

根據GCC的討論和下面的評論,比如@ecatmur的,這個建議似乎得到了很大的嘲笑,並且很少被執行。這一提議似乎是強制的,它允許對任何具有某些實例的結構類型進行混淆。

It's obvious how difficult it would be to satisfy this interpretation of the added clause without totally crippling many optimisations - for little benefit, as few coders would want this guarantee, and those who do can just turn on fno-strict-aliasing (which IMO indicates larger problems). If implemented, this allowance is more likely to catch people out and spuriously interact with other declarations of unions, than to be useful.

很明顯,要滿足這一附加條款的解釋是多么困難,而不完全破壞許多優化——因為很少有程序員希望得到這樣的保證,而那些做的人只會打開fno-strict-aliasing(這是IMO所指出的更大的問題)。如果實施,這一津貼更有可能吸引人們出去,並與其他工會的聲明進行虛假的互動,而不是有用。

Omission of the line from C++

Following on from this and a comment I made elsewhere, @Potatoswatter in this answer here on SO states that:

在此之后,我在其他地方做了一個評論,在這個答案中,@ oswatter這樣寫道:

The visibility part was purposely omitted from C++ because it's widely considered to be ludicrous and unimplementable.

在c++中,可見性部分被故意省略了,因為它被廣泛認為是可笑和不可實現的。

In other words, it looks like C++ deliberately avoided adopting this added clause, likely due to its widely pereceived absurdity. On asking for an "on the record" citation of this, Potatoswatter provided the following key info about the thread's participants:

換句話說,看起來c++故意避免采用這一附加條款,這可能是由於它廣泛的荒謬。在要求“在記錄上”引用這一信息時,馬鈴薯watter提供了以下關於該線程參與者的關鍵信息:

The folks in that discussion are essentially "on the record" there. Andrew Pinski is a hardcore GCC backend guy. Martin Sebor is an active C committee member. Jonathan Wakely is an active C++ committee member and language/library implementer. That page is more authoritative, clear, and complete than anything I could write.

參與討論的人基本上都是“記錄在案的”。Andrew Pinski是一個鐵桿的GCC后端家伙。Martin Sebor是一個活躍的C委員會成員。Jonathan Wakely是一個活躍的c++委員會成員和語言/庫實現者。那一頁比我能寫的任何東西都更有權威性,更清晰,更完整。

Potatoswatter, in the same SO thread linked above, concludes that C++ deliberately excluded this line, leaving no special treatment (or, at best, implementation-defined treatment) for pointers into the common initial sequence. Whether their treatment will in future be specifically defined, versus any other pointers, remains to be seen; compare to my final section below about C. At present, though, it is not (and again, IMO, this is good).

與上面的線程相關聯的土豆watter得出的結論是,c++故意將這條線排除在外,並沒有給指向通用初始序列的指針留下任何特殊的處理(或者,至多是實現定義的處理)。他們的治療是否會在未來被明確定義,與其他的指針相比,還有待觀察;與我最后一節關於c的內容相比,現在,雖然,它不是(而且,IMO,這是好的)。

What does this mean for C++ and practical C implementations?

So, with the nefarious line from N685... 'cast aside'... we're back to assuming pointers into the common initial sequence are not special in terms of aliasing. Still. it's worth confirming what this paragraph in C++ means without it. Well, the 2nd GCC thread above links to another gem:

所以,從N685的邪惡路線…“拋棄”……我們又回到了初始序列上的指針並不是特別的混疊。不動。如果沒有它,那么用c++來確認這段話是值得的。嗯,第2個GCC線程與另一個gem鏈接:

C++ defect 1719. This proposal has reached DRWP status: "A DR issue whose resolution is reflected in the current Working Paper. The Working Paper is a draft for a future version of the Standard" - cite. This is either post C++14 or at least after the final draft I have here (N3797) - and puts forward a significant, and in my opinion illuminating, rewrite of this paragraph's wording, as follows. I'm bolding what I consider to be the important changes, and {these comments} are mine:

1719年c++缺陷。這個提議已經達到了DRWP的狀態:“一個DR問題,它的分辨率反映在當前的工作文件中。這份工作文件是標准的未來版本的草案。這是C++14,或者至少是我在這里的最終草案(N3797)之后,並提出了一個重要的,在我看來,這一段的措詞是有啟發性的,如下所述。我把我認為是重要的變化都加在了一起,這些評論是我的:

In a standard-layout union with an active member {"active" indicates a union instance, not just type} (9.5 [class.union]) of struct type T1, it is permitted to read {formerly "inspect"} a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2. [Note: Reading a volatile object through a non-volatile glvalue has undefined behavior (7.1.6.1 [dcl.type.cv]). —end note]

在標准布局與一個活躍的成員{“活躍”表示一個聯盟實例,不僅類型}(9.5[class.union])的結構類型T1,它允許讀{以前“檢查”}一個非靜態數據成員提供的結構體類型的另一個聯盟成員T2 m m是常見的一部分初始的T1和T2序列。[注意:通過非揮發性的glvalue讀取一個不穩定的對象有未定義的行為(7.1.6.1 [dcl.type.cv])。端注)

This seems to clarify the meaning of the old wording: to me, it says that any specifically allowed 'punning' among union member structs with common initial sequences must be done via an instance of the parent union - rather than being based on the type of the structs (e.g. pointers to them passed to some function). This wording seems to rule out any other interpretation, a la N685. C would do well to adopt this, I'd say. Hey, speaking of which, see below!

這似乎說明老措辭的意義:對我來說,它說任何特別允許“敲打”聯盟成員結構中常見的初始序列必須通過家長聯盟的一個實例,而不是基於結構的類型(例如,他們傳遞給一個函數的指針)。這一措詞似乎排除了任何其他解釋,即N685。我會說,如果采用這種方法,我們會做得很好。嘿,說到這里,請看下面!

The upshot is that - as nicely demonstrated by @ecatmur and in the GCC tickets - this leaves such union member structs by definition in C++, and practically in C, subject to the same strict aliasing rules as any other 2 officially unrelated pointers. The explicit guarantee of being able to read the common initial sequence of inactive union member structs is now more clearly defined, not including vague and unimaginably tedious-to-enforce "visibility" as attempted by N685 for C. By this definition, the main compilers have been behaving as intended for C++. As for C?

其結果是——正如@ecatmur和GCC的票所顯示的那樣——這使得這樣的union成員結構按照c++的定義,實際上是在C中,與其他兩個與官方無關的指針一樣嚴格的混疊規則。現在更明確地定義了能夠讀取非活動聯合成員結構的初始序列的明確保證,不包括N685對C進行的模糊和不可想象的tediousto -強制“可視性”。根據這個定義,主編譯器的行為是針對c++的。至於C ?

Possible reversal of this line in C / clarification in C++

It's also very worth noting that C committee member Martin Sebor is looking to get this fixed in that fine language, too:

同樣值得注意的是,C委員會的成員Martin Sebor也想用這種語言來解決這個問題:

Martin Sebor 2015-04-27 14:57:16 UTC If one of you can explain the problem with it I'm willing to write up a paper and submit it to WG14 and request to have the standard changed.

如果你可以用它來解釋這個問題,我願意寫一篇論文,然后提交給WG14,要求修改標准。

Martin Sebor 2015-05-13 16:02:41 UTC I had a chance to discuss this issue with Clark Nelson last week. Clark has worked on improving the aliasing parts of the C specification in the past, for example in N1520 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm). He agreed that like the issues pointed out in N1520, this is also an outstanding problem that would be worth for WG14 to revisit and fix."

Martin Sebor 2015-05-13 16:02:41 UTC我上周有機會和Clark Nelson討論這個問題。在過去,克拉克一直致力於改進C規范的混淆部分,例如N1520 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm)。他同意,就像N1520所指出的那樣,這也是一個值得WG14重新審視和修復的突出問題。

Potatoswatter inspiringly concludes:

Potatoswatter鼓舞地總結道:

The C and C++ committees (via Martin and Clark) will try to find a consensus and hammer out wording so the standard can finally say what it means.

C和c++委員會(通過Martin和Clark)將努力尋找共識,並敲定措辭,這樣標准就能最終說出它的含義。

We can only hope!

我們只能希望!

Again, all further thoughts are welcome.

再一次,所有的想法都是受歡迎的。

#2


4  

I suspect it means that the access to these common parts is permitted not only through the union type, but outside of the union. That is, suppose we have this:

我懷疑這意味着這些公共部分的訪問不僅是通過工會的形式,而且是在工會之外。也就是說,假設我們有

union u {
  struct s1 m1;
  struct s2 m2;
};

Now suppose that in some function we have a struct s1 *p1 pointer which we know was lifted from the m1 member of such a union. We can cast this to a struct s2 * pointer and still access the members which are in common with struct s1. But somewhere in the scope, a declaration of union u has to be visible. And it has to be the complete declaration, which informs the compiler that the members are struct s1 and struct s2.

現在假設在某個函數中我們有一個結構s1 *p1指針,我們知道它是從這樣一個聯盟的m1成員中提出來的。我們可以將其轉換為struct s2 *指針,並仍然訪問與struct s1相同的成員。但是在這個范圍內的某個地方,一個聯合聲明必須是可見的。它必須是完整的聲明,它通知編譯器成員是struct s1和struct s2。

The likely intent is that if there is such a type in scope, then the compiler has knowledge that struct s1 and struct s2 are aliased, and so an access through a struct s1 * pointer is suspected of really accessing a struct s2 or vice versa.

可能的意圖是,如果在范圍內存在這樣的類型,那么編譯器就會知道結構s1和struct s2的結構是不一致的,因此,通過結構s1 *指針的訪問被懷疑真正訪問了struct s2,反之亦然。

In the absence of any visible union type which joins those types this way, there is no such knowledge; strict aliasing can be applied.

沒有任何可見的聯合類型以這種方式連接這些類型,就沒有這樣的知識;嚴格的混疊可以應用。

Since the wording is absent from C++, then to take advantage of the "common initial members relaxation" rule in that language, you have to route the accesses through the union type, as is commonly done anyway:

由於在c++中沒有使用該詞,因此要利用該語言中的“通用初始成員放松”規則,您必須通過union類型路由訪問,這通常是這樣做的:

union u *ptr_any;
// ...
ptr_any->m1.common_initial_member = 42;
fun(ptr_any->m2.common_initial_member);  // pass 42 to fun

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2016/01/05/7252c617d92f30b4a69ab8782c9a4a67.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com