Overall Analysis in Region Mode
Figure 1 Case 9
The region mode supports instrumentation of multiple functions or loops. You can run multiple methods in a new case in sequence. In the example, method 9 is added to invoke the following methods in sequence:
- 1(parallel_matmult)
- 2(transpose_B_matmult)
- 4(block_transpose_B_matmult)
- 5(intrinsics_transpose_B_matmult)
- 6(kml_matmult_8192)
- Run the multi_method_matmult case whose matrix determinant is 8192.
1./matmul 8192 9
Command output:
1 2 3 4 5 6 7 8
Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0 Initialization time = 2.663596s Matrix multiplication time(parallel_matmult) = 524.732915s Matrix multiplication time(transpose_B_matmult) = 12.199910s Matrix multiplication time(block_transpose_B_matmult) = 3.940094s Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.543300s Matrix multiplication time(kml_matmult_8192) = 0.320360s Matrix multiplication time = 543.736720s
You can see the time spent on each method.
- Create a roofline analysis task for the multi_method_matmult 8192 case.
1devkit tuner roofline -o multi_method_matmult_8192 -m region ./matmul 8192 9
Command output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Note: 1. Roofline task is currently only supported on the 920 platform. 2. The application must be a binary file in ELF format, and read permissions are required to detect the format of the application. 3. Roofline task collection needs to ensure the application has finished running. 4. The estimated time of roofline collection is about 3 * application estimated time. 5. Roofline analysis is available only on physical machines. 6. You can learn about the roofline profiling method by looking at document /usr/local/devkit/tuner/docs/ROOFLINE_KNOW_HOW.MD RFCOLLECT: Start collection for ./matmul RFCOLLECT: Launch application to collect performance metrics of ./matmul Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0 Initialization time = 2.712200s ROOFLINE_EVENTS are initialized. Matrix multiplication time(parallel_matmult) = 522.059154s Matrix multiplication time(transpose_B_matmult) = 10.515641s Matrix multiplication time(block_transpose_B_matmult) = 3.325110s Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.258315s Matrix multiplication time(kml_matmult) = 0.287929s Matrix multiplication time = 538.468327s RFCOLLECT: Launch application to do binary instrumentation of ./matmul Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0 Initialization time = 8.095281s Matrix multiplication time(parallel_matmult) = 348.475675s Matrix multiplication time(transpose_B_matmult) = 17.144564s Matrix multiplication time(block_transpose_B_matmult) = 3.646071s Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.427023s Matrix multiplication time(kml_matmult) = 0.297098s Matrix multiplication time = 371.991296s RFCOLLECT: Launch benchmarks for measuring roofs RFCOLLECT: Processing all collected data RFCOLLECT: Result is captured at /matrix_multiplication/rfcollect-20240507-115538.json RFCOLLECT: Run "rfreport /matrix_multiplication/rfcollect-20240507-115538.json" to get report. Get roofline report ... The roofline json report: /matrix_multiplication/multi_method_matmult_8192.json The roofline html report: /matrix_multiplication/multi_method_matmult_8192.html
- View the multi_method_matmult_8192.html report.
Present all the cases provided in Parallel Case to KML Case into the same diagram for convenient analysis.
Figure 2 multi_method_matmult_8192.html
Parent topic: Using Roofline for Performance Analysis