片上内存带宽测试
本文针对片上内存带宽提供两种测试方式,即单NUMA 片上内存带宽测试和单节点片上内存带宽测试。
以下是单NUMA 片上内存带宽测试方式:
- 将测试线程数设置为单NUMA支持的最大线程数,并通过绑定片上内存以便将数据分布到片上内存。
- 测试单NUMA的片上内存带宽,参考测试命令如下:
OMP_NUM_THREADS=38 OMP_PROC_BIND=close taskset -c 0-37 numactl -m 16 ./stream_c.exe
参考测试结果中的Triad值,单个NUMA的带宽约在380000~410000MB/s:
------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 141648512 (elements), Offset = 0 (elements) Memory per array = 1080.7 MiB (= 1.1 GiB). Total memory required = 3242.1 MiB (= 3.2 GiB). Each kernel will be executed 500 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Number of Threads requested = 38 Number of Threads counted = 38 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 6192 microseconds. (= 6192 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 402552.3 0.005893 0.005630 0.007208 Scale: 380128.4 0.006136 0.005962 0.009997 Add: 420949.0 0.008377 0.008076 0.009797 Triad: 403806.4 0.008782 0.008419 0.010945 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------
以下是单节点片上内存带宽测试方式:
- 使用MPI启动16个进程,每个进程绑定一个NUMA并将进程的线程数设置为单NUMA支持的最大线程数,同时通过绑定对应的片上内存以便将进程的数据分布到片上内存。
- 测试单节点的片上内存带宽,参考测试命令如下:
mpirun -np 16 --map-by numa ./bw_node_mem_on_chip.sh
bw_node_mem_on_chip.sh脚本参考如下:
#!/bin/bash rank=${OMPI_COMM_WORLD_LOCAL_RANK} start=$(($rank * 38)) end=$(($rank * 38 + 37)) OMP_NUM_THREADS=38 OMP_PROC_BIND=close taskset -c ${start}-${end} numactl -m $(($rank + 16)) ./stream_c.exe参考测试结果中的Triad值,每个进程的测试结果约在380000~410000MB/s,则单节点片上内存带宽为16个进程带宽测试结果的总和:------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 395880.0 0.006105 0.005725 0.012895 Scale: 383998.0 0.006379 0.005902 0.013061 Add: 414187.1 0.008566 0.008208 0.016977 Triad: 399149.2 0.008950 0.008517 0.014249 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 401481.2 0.006033 0.005645 0.011622 Scale: 382253.1 0.006311 0.005929 0.018508 Add: 421359.5 0.008711 0.008068 0.021030 Triad: 408012.3 0.008970 0.008332 0.019766 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 394696.5 0.006160 0.005742 0.022151 Scale: 386417.5 0.006283 0.005865 0.010724 Add: 414235.3 0.008654 0.008207 0.023914 Triad: 398357.4 0.009105 0.008534 0.029001 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 396987.7 0.006275 0.005709 0.054602 Scale: 380966.3 0.006411 0.005949 0.012462 Add: 413071.2 0.008757 0.008230 0.058634 Triad: 397979.4 0.009042 0.008542 0.016576 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 395123.1 0.006204 0.005736 0.013740 Scale: 380645.9 0.006716 0.005954 0.056543 Add: 414127.0 0.008792 0.008209 0.042559 Triad: 398680.4 0.009151 0.008527 0.018464 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 397751.8 0.006195 0.005698 0.014172 Scale: 383224.0 0.006464 0.005914 0.025427 Add: 414825.8 0.008741 0.008195 0.023310 Triad: 400382.1 0.009250 0.008491 0.031880 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 395599.9 0.006272 0.005729 0.055929 Scale: 387283.4 0.006598 0.005852 0.019510 Add: 414427.9 0.008822 0.008203 0.057628 Triad: 398168.3 0.009365 0.008538 0.024487 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 395106.6 0.006227 0.005736 0.012475 Scale: 383424.9 0.006604 0.005911 0.135234 Add: 415236.5 0.008757 0.008187 0.015858 Triad: 400236.0 0.009139 0.008494 0.016357 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 394565.4 0.006253 0.005744 0.021472 Scale: 381486.1 0.006486 0.005941 0.012711 Add: 413922.6 0.008824 0.008213 0.026116 Triad: 398446.5 0.009428 0.008532 0.054950 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 396011.9 0.006345 0.005723 0.055082 Scale: 382053.4 0.006477 0.005932 0.037756 Add: 415951.2 0.008905 0.008173 0.059440 Triad: 399574.2 0.009203 0.008508 0.018427 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 395665.8 0.006477 0.005728 0.029760 Scale: 383471.3 0.006535 0.005910 0.016319 Add: 413778.5 0.008959 0.008216 0.023425 Triad: 399899.2 0.009258 0.008501 0.018282 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 396491.0 0.006429 0.005716 0.020900 Scale: 383224.0 0.006531 0.005914 0.029009 Add: 415284.9 0.008942 0.008186 0.023462 Triad: 397846.2 0.009269 0.008545 0.023168 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 403714.9 0.006431 0.005614 0.053611 Scale: 378237.7 0.006667 0.005992 0.055854 Add: 421995.5 0.009127 0.008056 0.058290 Triad: 403315.2 0.009863 0.008429 0.134397 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 404058.1 0.006639 0.005609 0.024329 Scale: 384463.9 0.006897 0.005895 0.056308 Add: 420477.3 0.009536 0.008085 0.042015 Triad: 406546.5 0.009711 0.008362 0.043979 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 397186.8 0.006756 0.005706 0.055778 Scale: 376350.9 0.006862 0.006022 0.024000 Add: 413682.4 0.009218 0.008218 0.056509 Triad: 399160.4 0.009873 0.008517 0.035960 ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 394434.5 0.006698 0.005746 0.057386 Scale: 384775.2 0.007042 0.005890 0.055602 Add: 414319.5 0.009339 0.008205 0.059440 Triad: 398491.0 0.009642 0.008531 0.024842 -------------------------------------------------------------
父主题: STREAM测试