單線程程序顯然使用多個核心。

[英]Single thread programme apparently using multiple core


Question summary: all four cores used when running a single threaded programme. Why?

問題摘要:運行單線程程序時使用的所有四個內核。為什么?

Details: I have written a non-parallelised programme in Xcode (C++). I was in the process of parallelising it, and wanted to see whether what I was doing was actually resulting in more cores being used. To that end I used Instruments to look at the core usage. To my surprise, while my application is single threaded, all four cores were being utilised.

細節:我在Xcode (c++)中編寫了一個非並行程序。我在並行化它的過程中,想看看我所做的是否真的會導致更多的內核被使用。為此,我使用了一些工具來研究核心用法。令我驚訝的是,雖然我的應用程序是單線程的,但所有四個核心都被使用了。

To test whether it changed the performance, I dialled down the number of cores available to 1 (you can do it in Instruments, preferences) and the speed wasn't reduced at all. So (as I knew) the programme isn't parallelised in any way.

為了測試它是否改變了性能,我將可用的核心數量減少到1(您可以在工具中使用它,首選項),而且速度也沒有降低。所以(正如我所知道的),這個計划並不是並行的。

I can't find any information on what it means to use multiple cores to perform single threaded tasks. Am I reading the Instruments output wrong? Or is the single-threaded process being shunted between different cores for some reason (like changing lanes on a road instead of driving in two lanes at once - i.e. actual parallelisation)?

我找不到任何關於使用多核來執行單線程任務的信息。我是否讀錯了樂器的輸出?或者,單線程的進程在不同的內核之間被分流,出於某種原因(比如在路上改變車道,而不是同時在兩條車道上行駛,即實際的並行化)?

Thanks for any insight anyone can give on this.

謝謝你對這個問題的任何見解。

EDIT with MWE (apologies for not doing this initially). The following is C++ code that finds primes under 500,000, compiled in Xcode.

和MWE一起編輯(抱歉,最初沒有這樣做)。以下是在Xcode中編譯的50萬以下的c++代碼。

#include <iostream>

int main(int argc, const char * argv[]) {
    clock_t start, end;
    double runTime;
    start = clock();
    int i, num = 1, primes = 0;
    int num_max = 500000;

    while (num <= num_max) {
        i = 2;
        while (i <= num) {
            if(num % i == 0)
                break;
                i++;
        }
        if (i == num){
            primes++;
            std::cout << "Prime: " << num << std::endl;
        }

        num++;
    }

    end = clock();
    runTime = (end - start) / (double) CLOCKS_PER_SEC;
    std::cout << "This machine calculated all " << primes << " under " << num_max << " in " << runTime << " seconds." << std::endl;

    return 0;
}

This runs in 36s or thereabouts on my machine, as shown by the final out and my phone's stopwatch. When I profile it (using instruments launched from within Xcode) it gives a run-time of around 28s. The following image shows the core usage.

這在我的機器上運行36秒左右,就像我手機的秒表顯示的那樣。當我配置它(使用從Xcode中啟動的工具)時,它提供了大約28秒的運行時。下圖顯示了核心使用情況。

instruments showing core usage with all 4 cores (with hyper threading)

顯示核心使用的所有4個核心的工具(使用超線程)

Now I reduce number of available cores to 1. Re-running from within the profiler (pressing the record button), it says a run-time of 29s; a picture is shown below.

現在,我將可用核數減少到1。從profiler中重新運行(按下記錄按鈕),它表示運行時間為29秒;如下圖所示。

instruments output with only 1 core available

儀器輸出只有一個核心可用。

That would accord with my theory that more cores doesn't improve performance for a single thread programme! Unfortunately, when I actually time the programme with my phone, the above took about 1 minute 30s, so there is a meaningful performance gain from having all cores switched on.

這將符合我的理論,即更多的內核並不能提高單個線程程序的性能!不幸的是,當我使用我的手機時,上面花了大約1分鍾30秒,所以有一個有意義的性能增益從所有的內核打開。

One thing that is really puzzling me, is that, if you leave the number of cores at 1, go back to Xcode and run the program, it again says it takes about 33s, but my phone says it takes 1 minute 50s. So changing the cores is doing something to the internal clock (perhaps).

有一件事讓我很困惑,那就是,如果你把核數放在1,回到Xcode然后運行這個程序,它又說它需要大約33秒,但是我的手機說它需要1分鍾50。因此,改變內核是在對內部時鍾做一些事情(也許)。

Hopefully that describes the problem fully. I'm running on a 2015 15 inch MBP, with 2.2GHz i7 quad core processor. Xcode 7.3.1

希望這能充分說明問題。我正在運行一個2015年15英寸的MBP,有2.2GHz i7 quad核心處理器。Xcode 7.3.1

2 个解决方案

#1


5  

I want to premise your answer lacks a lots of information in order to proceed an accurate diagnostic. Anyway I'll try to explain you the most common reason IHMO, supposing you application doesn't use 3-rd part component which perform in a multi-thread way.

我想假設你的答案缺乏大量的信息,以便進行准確的診斷。不管怎樣,我會試着解釋一下IHMO最常見的原因,假設您的應用程序不使用以多線程方式執行的3-rd部分組件。

I think that could be a result of scheduler effect. I'm going to explain what I mean.

我想這可能是調度器效應的結果。我將解釋我的意思。

Each core of the processor takes a process in the system and executed it for a "short" amount of time. This is the most common solution in desktop operative system.

處理器的每個內核都在系統中執行一個進程,並將其執行為“短”時間。這是桌面操作系統中最常見的解決方案。

Your process is executed on a single core for this amount of time and then stopped in order to allow other process to continue. When your same process is resumed it could be executed in another core (always one core, but a different one). So a poor precise task manager with a low resolution time could register the utilization of all cores, even if it does not.

您的進程在這段時間的一個核心上執行,然后停止,以允許其他進程繼續。當相同的進程恢復時,它可以在另一個內核中執行(始終是一個核心,但是另一個核心)。因此,一個低分辨率的精確任務管理器可以注冊所有內核的利用率,即使它沒有。

In order to verify whether the cause could be that, I suggest you to see the amount of CPU % used in the time your application is running. Indeed in case of a single thread application the CPU should be about 1/#numberCore , in your case 25%.

為了驗證原因是否可能,我建議您查看應用程序運行時使用的CPU %的數量。實際上,在單個線程應用程序的情況下,CPU應該是1/#numberCore,在您的情況下是25%。

#2


0  

If it's a release build your compiler may be vectorising parallelise your code. Also libraries you link against, say the standard library for example, may be threaded or vectorised.

如果是一個版本,編譯器可能會對你的代碼進行矢量化。例如,您鏈接的庫,比如標准庫,可能是線程化的,也可能是矢量化的。


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:https://www.itdaan.com/blog/2016/08/10/720a06c66d2183fcea88382fd375672d.html



 
粤ICP备14056181号  © 2014-2020 ITdaan.com