在創建3個IEnumerables的聯合時,實現O(n)性能的最簡單方法是什么?

[英]What is the simplest way to achieve O(n) performance when creating the union of 3 IEnumerables?


Say a, b, c are all List<t> and I want to create an unsorted union of them. Although performance isn't super-critical, they might have 10,000 entries in each so I'm keen to avoid O(n^2) solutions.

比方說a, b, c都是List ,我想創建一個未排序的組合。雖然表現不是超臨界,他們可能有10000項在每一個我希望避免O(n ^ 2)解決方案。

AFAICT the MSDN documentation doesn't say anything about the performance characteristics of union as far as the different types are concerned.

就不同類型而言,MSDN文檔並沒有說明任何關於union的性能特征。

My gut instinct says that if I just do a.Union(b).Union(c), this will take O(n^2) time, but new Hashset<t>(a).Union(b).Union(c) would be O(n).

我的直覺說,如果我只做a.Union(b).Union(c),這需要O(n ^ 2)時間,但new Hashset < t >(a).Union(b).Union(c)將是O(n)。

Does anyone have any documentation or metrics to confirm or deny this assumption?

有沒有人有任何文件或指標來證實或否定這個假設?

3 个解决方案

#1


24  

You should use Enumerable.Union because it is as efficient as the HashSet approach. Complexity is O(n+m) because:

您應該使用可點數的。因為它和HashSet方法一樣高效。復雜度是O(n + m),因為:

Enumerable.Union

Enumerable.Union

When the object returned by this method is enumerated, Union<TSource> enumerates first and second in that order and yields each element that has not already been yielded.

當該方法返回的對象被枚舉時,Union 枚舉該順序中的第一和第二項,並生成尚未產生的每個元素。

Source-code here.

源代碼。


Ivan is right, there is an overhead if you use Enumerable.Union with multiple collections since a new set must be created for every chained call. So it might be more efficient(in terms of memory consumption) if you use one of these approaches:

Ivan是對的,如果使用Enumerable,會有一個開銷。與多個集合的聯合,因為必須為每個鏈接調用創建一個新集合。因此,如果您使用其中一種方法,可能會更有效(在內存消耗方面):

  1. Concat + Distinct:

    Concat +截然不同:

    a.Concat(b).Concat(c)...Concat(x).Distinct()
    
  2. Union + Concat

    聯盟+ Concat

    a.Union(b.Concat(c)...Concat(x))
    
  3. HashSet<T> constructor that takes IEnumerable<T>(f.e. with int):

    HashSet 構造函數接受IEnumerable (f.e。int):

    new HashSet<int>(a.Concat(b).Concat(c)...Concat(x))
    

The difference between the first two might be negligible. The third approach is not using deferred execution, it creates a HashSet<> in memory. It's a good and efficient way 1. if you need this collection type or 2. if this is the final operation on the query. But if you need to to further operations on this chained query you should prefer either Concat + Distinct or Union + Concat.

前兩者之間的差別可以忽略不計。第三種方法不是使用延遲執行,而是在內存中創建一個HashSet<>。這是一種有效的方法。如果您需要這個集合類型或2。如果這是查詢的最終操作。但是如果您需要進一步操作這個鏈式查詢,您應該選擇Concat + Distinct或Union + Concat。

#2


6  

While @Tim Schmelter is right about linear time complexity of the Enumerable.Union method, chaining multiple Union operators has the hidden overhead that every Union operator internally creates a hash set which basically duplicates the one from the previous operator (plus additional items), thus using much more memory compared to single HashSet approach.

雖然@Tim Schmelter對可枚舉的線性時間復雜度是正確的。Union方法,鏈接多個Union操作符有一個隱藏的開銷,每個Union操作符在內部創建一個散列集,它基本上復制了前面的操作符(加上附加項)的一個散列集,因此使用的內存比單獨的HashSet方法要多得多。

If we take into account the fact that Union is simply a shortcut for Concat + Distinct, the scalable LINQ solution with the same time/space complexity of the HashSet will be:

如果我們考慮到Union僅僅是Concat + Distinct的一個快捷方式,那么具有相同時間/空間復雜度的可伸縮LINQ解決方案將是:

a.Concat(b).Concat(c)...Concat(x).Distinct()

#3


1  

Union is O(n).

聯盟是O(n)。

a.Union(b).Union(c) is less efficient in most implementations than a.Union(b.Concat(c)) because it creates a hash-set for the first union operation and then another for the second, as other answers have said. Both of these also end up with a chain of IEnumerator<T> objects in use which increases cost as further sources are added.

a. union (b). union (c)在大多數實現中效率較a. union (b. concat (c))更低,因為它為第一個union操作創建了一個hash-set,然后另一個用於第二個操作,就像其他答案所說的那樣。這兩種方法最后都使用了IEnumerator 對象的鏈,在使用中增加了更多的資源。

a.Union(b).Union(c) is more efficient in .NET Core because the second .Union() operation produces a single object with knowledge of a, b and c and it will create a single hash-set for the entire operation, as well as avoiding the chain of IEnumerator<T> objects.

a. union (b). union (c)在. net核心中更有效,因為第二個. union()操作產生一個具有a、b和c的知識的單一對象,它將為整個操作創建一個單一的hashset,同時避免IEnumerator 對象鏈。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2017/07/03/7301cdfa60eaba1e980d834bda8450d9.html



 
粤ICP备14056181号  © 2014-2021 ITdaan.com