鲲鹏社区首页
中文
注册
开发者
我要评分
获取效率
正确性
完整性
易理解
在线提单
论坛求助

测试运行

  1. 将测试线程数设置为NUMA最大线程数,以便测试NUMA整体带宽。
  2. 测试各NUMA的DDR带宽,参考测试命令:
    for ((i = 0; i < 608; i += 38)); do
        OMP_NUM_THREADS=38 OMP_PROC_BIND=close taskset -c ${i}-$((${i}+37)) numactl -m $((${i}/38)) ./stream_c.exe
    done

    参考测试结果中的Triad值,每个NUMA测试结果在120000~150000MB/s:

    -------------------------------------------------------------
    STREAM version $Revision: 5.10 $
    -------------------------------------------------------------
    This system uses 8 bytes per array element.
    -------------------------------------------------------------
    Array size = 141648512 (elements), Offset = 0 (elements)
    Memory per array = 1080.7 MiB (= 1.1 GiB).
    Total memory required = 3242.1 MiB (= 3.2 GiB).
    Each kernel will be executed 20 times.
     The *best* time for each kernel (excluding the first iteration)
     will be used to compute the reported bandwidth.
    -------------------------------------------------------------
    Number of Threads requested = 38
    Number of Threads counted = 38
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 17826 microseconds.
       (= 17826 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function    Best Rate MB/s  Avg time     Min time     Max time
    Copy:          125994.0     0.018811     0.017988     0.020024
    Scale:         135105.3     0.017003     0.016775     0.017418
    Add:           144631.7     0.023637     0.023505     0.023769
    Triad:         144521.8     0.023850     0.023523     0.024183
    -------------------------------------------------------------
    Solution Validates: avg error less than 1.000000e-13 on all three arrays
    -------------------------------------------------------------
  3. 测试各NUMA的片上内存带宽,参考测试命令:
    for ((i = 0; i < 608; i += 38)); do
        OMP_NUM_THREADS=38 OMP_PROC_BIND=close taskset -c ${i}-$((${i}+37)) numactl -m $((${i}/38+16)) ./stream_c.exe
    done

    参考测试结果中的Triad值,每个NUMA的带宽约在380000~400000MB/s:

    -------------------------------------------------------------
    STREAM version $Revision: 5.10 $
    -------------------------------------------------------------
    This system uses 8 bytes per array element.
    -------------------------------------------------------------
    Array size = 141648512 (elements), Offset = 0 (elements)
    Memory per array = 1080.7 MiB (= 1.1 GiB).
    Total memory required = 3242.1 MiB (= 3.2 GiB).
    Each kernel will be executed 20 times.
     The *best* time for each kernel (excluding the first iteration)
     will be used to compute the reported bandwidth.
    -------------------------------------------------------------
    Number of Threads requested = 38
    Number of Threads counted = 38
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 5151 microseconds.
       (= 5151 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function    Best Rate MB/s  Avg time     Min time     Max time
    Copy:          403423.6     0.005716     0.005618     0.005873
    Scale:         399288.9     0.005857     0.005676     0.006221
    Add:           432229.1     0.007951     0.007865     0.008594
    Triad:         406941.0     0.008642     0.008354     0.009097
    -------------------------------------------------------------
    Solution Validates: avg error less than 1.000000e-13 on all three arrays
    -------------------------------------------------------------