Viewing Analysis Results
Prerequisites
An HPC application analysis task has been created and the analysis is complete.
Procedure
- In the System Profiler area on the left, click the name of the target analysis task.
The node list is displayed.
- Click the name of the target node to view the analysis results.
- Click the node name. The Summary tab page is displayed by default, as shown in Figure 1. Table 1 describes the parameters.
You can click Code Samples in the Optimization Suggestion area or click
in the lower right corner to view the code sample. For details, see the tuning example.Table 1 Parameters on the Summary tab page Parameter
Description
Elapsed Time
Execution time of an application.
Serial Time
Serial running time of an application.
Parallel Time
How long an application is running in parallel.
Imbalance
Running time of the application that is unbalanced.
CPU Utilization
CPU usage, that is, the ratio of the CPU usage to the running OpenMP.
OpenMP Team Information
OpenMP team information.
OpenMP Team Usage
OpenMP team usage.
Function
Invoked functions.
Module
Invoked module.
CPU Time (s)
CPU usage time.
Parallel region
Parallel region.
Potential Gain (s)
Difference between the actual duration and the theoretical duration.
Imbalance Ratio (%)
Rate of running applications that are imbalanced.
Average Time (ms)
Average running time.
CPI Ratio of CPU cycles/Retired instructions, which indicates the clock cycle consumed by each instruction.
Effective Utilization
CPU usage of the thread effective working.
Spinning
CPU usage occupied by the thread waiting for spinlock.
Overhead
CPU usage of other overheads.
Instructions Retired
Total number of retired instructions.
MPI Wait Rate
Percentage of time spent on the MPI block function.
Communication
Percentage of cluster communications to total communications.
Point to point
Percentage of time spent on the point-to-point communication function.
Collective
Percentage of time spent on the MPI collection function.
Synchronization
Percentage of time spent on the synchronization function.
Table 2 Parameters in the Hotspots area Parameter
Description
Grouping Mode
By default, Function is displayed. You can also select Module, parallel-region, or barrier-to-barrier-segment.
function
Invoked functions.
module
Invoked module.
parallel-region
Parallel region.
barrier-to-barrier-segment
Special stand-alone section.
in Loop
Loop data. This parameter is displayed only when Function is selected for Grouping Mode.
CPU(%)
CPU usage.
CPU(s)
CPU time.
Spin(s)
CPU time for waiting for spinlock.
Overhead(s)
CPU time occupied by other overheads.
CPI
Ratio of CPU cycles/Retired instructions, which indicates the clock cycle consumed by each instruction.
Ret(%)
CPU microarchitecture execution efficiency. The calculation formula is INST_RETIRED / (4 x CPU_CYCLES).
Back(%)
Percentage of CPU pipeline execution pauses caused by insufficient resources such as core and memory.
Mem(%)
Percentage of CPU pipeline execution pauses caused by memory access latency.
L1(%)
Percentage of CPU pipeline execution pauses caused by L1 cache hits.
L2(%)
Percentage of CPU pipeline execution pauses caused by L2 cache hits.
L3/M(%)
Percentage of CPU pipeline execution pauses caused by L2 cache misses.
Core(%)
Percentage of CPU pipeline execution pauses due to instructions being executed.
SIMD(%)
Percentage of SIMD instructions.
Front(%)
Percentage of CPU pipeline execution pauses caused by front-end components.
Spec(%)
Percentage of CPU pipeline execution pauses caused by branch prediction execution.
Instr
Number of instructions.
Table 3 Parameters in the Memory Bandwidth area Parameter
Description
Memory Bandwidth
Average DRAM Bandwidth
Average DRAM bandwidth.
Read Bandwidth
Average read bandwidth.
Write Bandwidth
Average write bandwidth.
Intra-Socket Bandwidth
Bandwidth of a socket.
Cross-Socket Bandwidth
Cross-socket bandwidth.
L3 By-Pass Rate
L3 bypass rate.
L3 Miss Rate
L3 miss rate.
L3 Usage
L3 cluster usage.
Command distribution (hover your mouse pointer to the question mark next to a parameter to view details)
Table 4 Parameters in the HPC Top-Down and PMU Events areas Parameter
Description
HPC Top-Down
Event Name
Name of the top-down event.
Event Percentage
Proportion of the top-down event.
Number of original PMU events
Miss Events
Name of the PMU event.
Count
Number of PMU events.
Table 5 MPI runtime metrics Parameter
Description
Grouping mode
Filter type. By default, function is selected. You can also select send-type, recv-type, mpi-comm, caller, send-size or recv-size.
function
Invoked functions.
MPI Rank
Logical working unit.
Wait Rate(%)
Percentage of time spent on the MPI block function.
P2P Comm(%)
Percentage of time spent on the MPI point-to-point communication function.
Coll Comm(%)
Percentage of time spent on the MPI collection function.
Sync(%)
Percentage of time spent on the MPI synchronization function.
Single I/O(%)
Percentage of time spent on the MPI_File_read and MPI_File_write functions.
Coll I/O(%)
Percentage of time spent on the MPI_File_read_all and MPI_File_write_all functions.
Avg Time
Average latency.
Call Count
Number of calls.
Data Size(bytes)
Size of transmitted data.
Send data type
Type of sent data.
Recv data type
Type of received data.
Sent
Working unit that sends data.
Received
Working unit that receives data.
Table 6 OpenMP runtime metrics Parameter
Description
Parallel region
Parallel region.
Barrier-to-barrier segment
Special stand-alone section.
Potential Gain (s)
Difference between the ideal and real wall time of parallel region.
Elapsed Time (s)
Wall time of the parallel region.
Imbalance (s)
Wall time lost because threads are waiting each other at the end of parallel region.
Imb (%)
Ratio of the execution time of unbalanced applications to the total execution time.
CPU Util (%)
CPU usage in the parallel region.
Avg (ms)
Average latency.
Count
Number of calls.
Lock Cont (s)
CPU time of a worker thread on a lock that consumes CPU resources.
Creation (s)
Overhead of a parallel work assignment.
Scheduling (s)
OpenMP runtime scheduler overhead on a parallel work assignment for working threads.
Tasking (s)
Time when the task is assigned.
Reduction (s)
Runtime overhead on performing reduction operations.
Atomics (s)
Runtime overhead on performing atomic operations.
- View the OpenMP timeline tab page, as shown in Figure 2. Table 7 describes the parameters.
- You can use "←" and "→" to switch between threads. Key threads are marked with
. Drag the time axis to view the data in the corresponding time range or select key threads you want to view from the drop-down list. - A maximum of 10 hot call stacks can be displayed.
Table 7 Parameters on the OpenMP timeline tab page Parameter
Description
TID
Thread ID.
Region Type
Region type of a thread.
Start Time
Start time of a phase for a thread.
Duration
Duration of a phase for a thread.
CPI
Ratio of CPU cycles/Retired instructions, which indicates the clock cycle consumed by each instruction.
Instructions Retired
Total number of instructions.
Callstack
Name of the call stack.
Call Times
Number of times that the stack is called.
Invoking Ratio (%)
Percentage of the called stack in all stacks.
Event Name
Name of the top-down event.
Event Ratio (%)
Proportion of the top-down event.
- You can use "←" and "→" to switch between threads. Key threads are marked with
- View the MPI timeline tab page, as shown in Figure 3. Table 8 describes the parameters.
If you select RDMA and Shared storage when creating an analysis task, you can click
to view related data. You can further click a time point in the line chart to view the details.Table 8 MPI timeline parameters Parameter
Description
Basic Rank Information
rank ID
ID of the selected rank.
Start Time
Start time of a phase for a thread.
Duration
Duration of a phase for a thread.
CPI
Ratio of CPU cycles/Retired instructions, which indicates the clock cycle consumed by each instruction.
Instructions Retired
Total number of instructions.
Cluster Communication Type
Cluster communication type.
Communicator Root
Communicator root.
Communicator Name
Communicator name.
Communication Data Volume
Amount of data sent and received during communication.
Communicator Members
Number of communicator members.
Communicator Member
Specific communicator member.
Rank Invoking Information
Callstack
Name of the call stack.
Call Times
Number of times that the stack is called.
Invoking Ratio (%)
Percentage of the called stack in all stacks.
Event Name
Name of the top-down event.
Event Ratio (%)
Proportion of the top-down event.
RDMA Information
Node IP Address
IP address of the RDMA.
Collection Time
Collection time of the RDMA data.
Receive
Amount of data received at the current time point.
Send
Amount of data sent at the current time point.
Shared Storage Information
Node IP Address
IP address of the shared storage.
Collection Time
Collection time of the current shared storage data.
Receive
Amount of data received at the current time point.
Send
Amount of data sent at the current time point.
- If you select Refined analysis when creating an HPC application analysis task, you can view the Communication Heatmap tab page. See Figure 4.
- By default, Rank to Rank is selected for Statistical Object, Data_Size for Statistical Indicator, Point to Point for Communication Type, and the first item in the drop-down list for Communicator.
- You can select any other statistical object (Node to Node), statistical metric (Latency), communication type (Cluster Communication), and communicator from the drop-down lists. If you select Latency for Statistical Indicator, the communication type can only be Point to Point.
- The data volume of (ranki, rankj) is the data sent by the ranki to rankj plus the data received by ranki from rankj.
- Move the mouse pointer to select an area in the left part of the following figure to view its details on the right part. You can click
or
or scroll the mouse wheel to zoom in or zoom out the area. - In the displayed dialog box for selecting a communicator, click
to search for communicator name and communicator member, click
to sort the communicator members, and click View Details to see the communicator information.
Click the drop-down list of Communicator to switch or filter the communicator information to be viewed.
Figure 5 Selecting a communicator
Select Node To Node for Statistical Object to view the rank information. See Figure 6. Major metrics are Local Percentage, Cross-DIE Percentage, and Cross-chip Percentage.
- Click the Task Information tab to view the detailed configuration and sampling information about the task on the current node.
If the task fails to be executed, the failure cause is displayed on the Task Information tab page.
If some data fails to be collected but the overall task execution is not affected, you can view the exception message in Exception Information.
Collection End Cause displays the reason why the data collection of the current task ends, for example, "Task collection times up" or "File size reaches the collection limit."
- Click the node name. The Summary tab page is displayed by default, as shown in Figure 1. Table 1 describes the parameters.




