Rate This Document
Findability
Accuracy
Completeness
Readability

Viewing Analysis Results

Prerequisites

An HPC application analysis task has been created and the analysis is complete.

Procedure

  1. In the System Profiler area on the left, click the name of the target analysis task.

    The node list is displayed.

  2. Click the name of the target node to view the analysis results.
    • Click the node name. The Summary tab page is displayed by default, as shown in Figure 1. Table 1 describes the parameters.

      You can click Code Sample in the Optimization Suggestion area or click in the lower right corner to view the code sample. For details, see the tuning example.

      Figure 1 Summary
      Table 1 Parameters on the Summary tab page

      Parameter

      Description

      Elapsed Time

      Execution time of an application.

      Serial Time

      Serial running time of an application.

      Parallel Time

      How long an application is running in parallel.

      Imbalance

      Running time of the application that is unbalanced.

      CPU Utilization

      CPU usage, that is, the ratio of the CPU usage to the running OpenMP.

      OpenMP Team Information

      OpenMP team information.

      OpenMP Team Usage

      OpenMP team usage.

      Function

      Invoked functions.

      Module

      Invoked module.

      CPU Time (s)

      CPU usage time.

      Parallel region

      Parallel region.

      Potential Gain (s)

      Difference between the actual duration and the theoretical duration.

      Imbalance Ratio (%)

      Rate of running applications that are imbalanced.

      Average Time (ms)

      Average running time.

      CPI

      Ratio of CPU cycles/Retired instructions, which indicates the clock cycle consumed by each instruction.

      Effective Utilization

      CPU usage of the thread effective working.

      Spinning

      CPU usage occupied by the thread waiting for spinlock.

      Overhead

      CPU usage of other overheads.

      Instructions Retired

      Total number of retired instructions.

      MPI Wait Rate

      Percentage of time spent on the MPI block function.

      Communication

      Percentage of cluster communications to total communications.

      Point to point

      Percentage of time spent on the point-to-point communication function.

      Collective

      Percentage of time spent on the MPI collection function.

      Synchronization

      Percentage of time spent on the synchronization function.

      Table 2 Parameters in the Hotspots area

      Parameter

      Description

      Grouping Mode

      By default, Function is displayed. You can also select Module, parallel-region, or barrier-to-barrier-segment.

      function

      Invoked functions.

      module

      Invoked module.

      parallel-region

      Parallel region.

      barrier-to-barrier-segment

      Special stand-alone section.

      in Loop

      Loop data. This parameter is displayed only when Function is selected for Grouping Mode.

      CPU(%)

      CPU usage.

      CPU(s)

      CPU time.

      Spin(s)

      CPU time for waiting for spinlock.

      Overhead(s)

      CPU time occupied by other overheads.

      CPI

      Ratio of CPU cycles/Retired instructions, which indicates the clock cycle consumed by each instruction.

      Ret(%)

      CPU microarchitecture execution efficiency. The calculation formula is INST_RETIRED / (4 x CPU_CYCLES).

      Back(%)

      Percentage of CPU pipeline execution pauses caused by insufficient resources such as core and memory.

      Mem(%)

      Percentage of CPU pipeline execution pauses caused by memory access latency.

      L1(%)

      Percentage of CPU pipeline execution pauses caused by L1 cache hits.

      L2(%)

      Percentage of CPU pipeline execution pauses caused by L2 cache hits.

      L3/M(%)

      Percentage of CPU pipeline execution pauses caused by L2 cache misses.

      Core(%)

      Percentage of CPU pipeline execution pauses due to instructions being executed.

      SIMD(%)

      Percentage of SIMD instructions.

      Front(%)

      Percentage of CPU pipeline execution pauses caused by front-end components.

      Spec(%)

      Percentage of CPU pipeline execution pauses caused by branch prediction execution.

      Instr

      Number of instructions.

      Table 3 Parameters in the Memory Bandwidth area

      Parameter

      Description

      Average DRAM Bandwidth

      Average DRAM bandwidth.

      Read Bandwidth

      Average read bandwidth.

      Write Bandwidth

      Average write bandwidth.

      Intra-Socket Bandwidth

      Bandwidth of a socket.

      Cross-Socket Bandwidth

      Cross-socket bandwidth.

      L3 By-Pass Rate

      L3 bypass rate.

      L3 Miss Rate

      L3 miss rate.

      L3 Usage

      L3 cluster usage.

      Table 4 Parameters in the Instruction Distribution area

      Parameter

      Description

      Memory

      Percentage of memory load/store instructions.

      Scalar

      Percentage of scalar data processing instructions.

      Vector

      Percentage of vector data processing instructions.

      Crypto

      Percentage of encryption instructions.

      Branches

      Percentage of branch execution instructions.

      Barriers

      Percentage of barrier execution instructions.

      Not Retired

      Percentage of valid prefetch instructions.

      Table 5 Parameters in the HPC Top-Down and PMU Events areas

      Parameter

      Description

      HPC Top-Down

      Event Name

      Name of the top-down event.

      Event Percentage

      Proportion of the top-down event.

      Number of original PMU events

      Miss Events

      Name of the PMU event.

      Count

      Number of PMU events.

      Table 6 MPI runtime metrics

      Parameter

      Description

      Grouping mode

      Filter type. By default, function is selected. You can also select send-type, recv-type, mpi-comm, caller, send-size or recv-size.

      function

      Invoked functions.

      MPI Rank

      Logical working unit.

      Wait Rate(%)

      Percentage of time spent on the MPI block function.

      P2P Comm(%)

      Percentage of time spent on the MPI point-to-point communication function.

      Coll Comm(%)

      Percentage of time spent on the MPI collection function.

      Sync(%)

      Percentage of time spent on the MPI synchronization function.

      Single I/O(%)

      Percentage of time spent on the MPI_File_read and MPI_File_write functions.

      Coll I/O(%)

      Percentage of time spent on the MPI_File_read_all and MPI_File_write_all functions.

      Avg Time

      Average latency.

      Call Count

      Number of calls.

      Data Size(bytes)

      Size of transmitted data.

      Send data type

      Type of sent data.

      Recv data type

      Type of received data.

      Sent

      Working unit that sends data.

      Received

      Working unit that receives data.

      Table 7 OpenMP runtime metrics

      Parameter

      Description

      Parallel region

      Parallel region.

      Barrier-to-barrier segment

      Special stand-alone section.

      Potential Gain (s)

      Difference between the ideal and real wall time of parallel region.

      Elapsed Time (s)

      Wall time of the parallel region.

      Imbalance (s)

      Wall time lost because threads are waiting each other at the end of parallel region.

      Imb (%)

      Ratio of the execution time of unbalanced applications to the total execution time.

      CPU Util (%)

      CPU usage in the parallel region.

      Avg (ms)

      Average latency.

      Count

      Number of calls.

      Lock Cont (s)

      CPU time of a worker thread on a lock that consumes CPU resources.

      Creation (s)

      Overhead of a parallel work assignment.

      Scheduling (s)

      OpenMP runtime scheduler overhead on a parallel work assignment for working threads.

      Tasking (s)

      Time when the task is assigned.

      Reduction (s)

      Runtime overhead on performing reduction operations.

      Atomics (s)

      Runtime overhead on performing atomic operations.

    • View the MPI timeline tab page, as shown in Figure 2. Table 8 describes the parameters.

      If you select RDMA and Shared storage when creating an analysis task, you can click to view related data. You can further click a time point in the line chart to view the details.

      Figure 2 MPI timeline
      Table 8 MPI timeline parameters

      Parameter

      Description

      Basic Rank Information

      rank ID

      ID of the selected rank.

      Start Time

      Start time of a phase for a thread.

      Duration

      Duration of a phase for a thread.

      CPI

      Ratio of CPU cycles/Retired instructions, which indicates the clock cycle consumed by each instruction.

      Instructions Retired

      Total number of instructions.

      Cluster Communication Type

      Cluster communication type.

      Communicator Root

      Communicator root.

      Communicator Name

      Communicator name.

      Communication Data Volume

      Amount of data sent and received during communication.

      Communicator Members

      Number of communicator members.

      Communicator Member

      Specific communicator member.

      Rank Invoking Information

      Callstack

      Name of the call stack.

      Call Times

      Number of times that the stack is called.

      Invoking Ratio (%)

      Percentage of the called stack in all stacks.

      Event Name

      Name of the top-down event.

      Event Ratio (%)

      Proportion of the top-down event.

      RDMA Information

      Node IP Address

      IP address of the RDMA.

      Collection Time

      Collection time of the RDMA data.

      Receive

      Amount of data received at the current time point.

      Send

      Amount of data sent at the current time point.

      Shared Storage Information

      Node IP Address

      IP address of the shared storage.

      Collection Time

      Collection time of the current shared storage data.

      Receive

      Amount of data received at the current time point.

      Send

      Amount of data sent at the current time point.

    • If you select Refined analysis when creating an HPC application analysis task, you can view the Communication Heatmap tab page. See Figure 3.
      • By default, Rank to Rank is selected for Statistical Object, Data_Size for Statistical Indicator, Point to Point for Communication Type, and the first item in the drop-down list for Communicator.
      • You can select any other statistical object (Node to Node), statistical metric (Latency), communication type (Cluster Communication), and communicator from the drop-down lists. If you select Latency for Statistical Indicator, the communication type can only be Point to Point.
      • The data volume of (ranki, rankj) is the data sent by the ranki to rankj plus the data received by ranki from rankj.
      • Move the mouse pointer to select an area in the left part of the following figure to view its details on the right part. You can click or or scroll the mouse wheel to zoom in or zoom out the area.
      • In the displayed dialog box for selecting a communicator, click to search for communicator name and communicator member, click to sort by the number of communicator members, and click View Details to see the communicator information.
      Figure 3 Communication heatmap

      Click the drop-down list of Communicator to switch or filter the communicator information to be viewed.

      Figure 4 Selecting a communicator

      Select Node To Node for Statistical Object to view the rank information. See Figure 5. Major metrics are Local Percentage, Cross-DIE Percentage, and Cross-chip Percentage.

      Figure 5 Communication heatmap (node-to-node)
    • Click the Task Information tab to view the detailed configuration and sampling information about the task on the current node.
      • If the task fails to be executed, the failure cause is displayed on the Task Information tab page.
      • If some data fails to be collected but the overall task execution is not affected, you can view the exception message in Exception Information.
      • Collection End Cause displays the reason why the data collection of the current task ends, for example, "Task collection times up" or "File size reaches the collection limit."