Rate This Document
Findability
Accuracy
Completeness
Readability

Viewing Analysis Results

Prerequisites

An HPC cluster check task has been created and the analysis is complete.

Procedure

  1. In the System Profiler area on the left, click the name of the target analysis task.

    The node list is displayed.

  2. Click the name of the target node to view the analysis results.

    Click a node name. The Cluster Configuration tab page is displayed by default, as shown in Figure 1. Table 1 describes the parameters.

    • You can move the mouse pointer to next to a group to view the grouping basis or click to group data as required.
    • Click Filter Node Data to view the detailed data of a node or Clear Node Filter to clear the filter conditions.
    • Parameters collected by a privileged user are marked with asterisks (*).
    Figure 1 Cluster hardware configuration (CPU)
    Table 1 Parameter description

    Parameter

    Description

    CPU

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the current node belongs.

    CPU Model

    CPU model of the node.

    Logical CPUs

    Number of logical CPUs of the node.

    Physical CPUs

    Number of physical CPUs of the node.

    Cores Per Physical CPU

    Number of cores per CPU on the node.

    Hyper-Threading

    Number of hyperthreads on the node.

    L1i cache

    L1i cache.

    L1d cache

    L1d cache.

    L2 cache

    L2 cache.

    L3 cache

    L3 cache.

    NUMA Nodes

    Number of NUMA nodes.

    Dominant Frequency

    Dominant frequency of the CPU.

    Cache Line Size

    Cache line size of the CPU.

    Vendor

    CPU vendor.

    Figure 2 Cluster hardware configuration (GPU)
    Table 2 Parameter description

    Parameter

    Description

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the current node belongs.

    CUDA Version

    CUDA version of the node.

    Driver Version

    GPU driver version of the node.

    GPUs

    Number of GPUs on the node.

    Model

    GPU model.

    SM Clock Frequency

    SM clock frequency of the GPU.

    Memory Clock Frequency

    Memory clock frequency of the GPU.

    Figure 3 Cluster hardware configuration (memory)
    Table 3 Parameter description

    Parameter

    Description

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the current node belongs.

    Total Memory Capacity

    Total memory capacity of the node.

    Page Size

    Page size of the node.

    Huge Page Size

    Huge page size of the node.

    *Vendor

    Vendor of the DIMMs on the node. The data is collected by privileged users.

    *Model

    Model of the DIMMs on the node. The data is collected by privileged users.

    *Size

    Memory capacity of the DIMMs on the node. The data is collected by privileged users.

    *Width

    Bit width of the DIMMs on the node. The data is collected by privileged users.

    *Clock Frequency

    Clock frequency of the DIMMs on the node. The data is collected by privileged users.

    *Bank Locator

    Location information of the DIMMs on the node. The data is collected by privileged users.

    Figure 4 Cluster hardware configuration (network)
    Table 4 Parameter description

    Parameter

    Description

    Network Device

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the current node belongs.

    RDMA Supported

    Indicates whether Remote Direct Memory Access (RDMA) is supported on the node.

    Logical NICs/Ports

    Number of logical NICs or network ports on the node.

    Logical Name

    Logical name of the NIC on the node.

    Model

    Model of the NIC on the node.

    Driver

    NIC driver of the node.

    Rate

    NIC speed of the node.

    MTU

    Maximum size of data packets that can pass through the network port of the node.

    Txqueuelen

    Storage length of the buffer for data transmission on the network port of the node.

    *PF_LOG_BAR_SIZE

    Value of NIC parameter PF_LOG_BAR_SIZE. The data is collected by privileged users.

    Driver Version

    Driver version of the NIC on the node.

    *Clock Frequency

    Clock frequency of the NIC on the node. The data is collected by privileged users.

    Device Name

    Device name of the NIC on the node.

    DSCP Priority

    Differentiated Services Code Point (DSCP) value of the NIC on the node.

    *TOS: service type

    Service type of the NIC on the node. The data is collected by privileged users.

    Priority Trust Mode

    Priority trust status of the NIC on the node.

    Flow control priority

    Information about the priority-based flow control (PFC) of the NIC on the node.

    DSCP

    DSCP details of the NIC on the node.

    DCQCN

    DCQCN details of the NIC on the node.

    Port

    Port number of the network port on the node.

    GID

    GID of the network port on the node.

    Version

    Version of the network port on the node.

    Index

    Index of the network port on the node.

    IP

    IP address of the network port on the node.

    Transfer Protocol

    Transfer protocol of the network port on the node.

    Device name: Port

    Device and port number of the network port on the node.

    Latency

    Latency of the network port on the node.

    Overhead

    Overhead used by the network port on the node.

    Bandwidth

    Bandwidth of the network port on the node.

    RX

    Size of the message receiving queue of the NIC on the node.

    TX

    Size of the message transmit queue of the NIC on the node.

    Other

    Other configurations of the NIC on the node.

    Combined

    Number of enabled queues of the NIC on the node.

    Route Configuration

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the current node belongs.

    Gateway

    Gateway of the current node.

    Genmask

    Mask of the current node.

    Destination

    Destination IP address of the route on the current node.

    Flags

    Flag of the route on the current node.

    Metric

    Route distance of the current node.

    Ref

    Number of references to routing entries of the current node.

    Use: number of times that the routing software queries the router

    Number of times that the routing entry of the current node is searched by routing software.

    Iface

    Output interface corresponding to the routing entry of the current node.

    Hosts Configuration

    Node IP Address

    IP address of a node in the cluster.

    Host IP

    Host IP address corresponding to the node IP address.

    Host Name

    Host name corresponding to the node IP address.

    Figure 5 Cluster hardware configuration (drive)
    Table 5 Parameter description

    Parameter

    Description

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the node belongs.

    Total Drive Capacity

    Total drive capacity of the node.

    Drives

    Number of drives on the node.

    Name

    Drive name.

    Vendor

    Drive vendor.

    Capacity

    Drive capacity.

    Type

    Drive type.

    Model

    Drive model.

    System disk partition directory

    System drive partition directory of the node.

    Partition type

    Partition type of the system drive on the node.

    File system type

    File system type in the system drive partition directory on the node.

    Capacity

    Target capacity of the system drive partition on the node.

    Figure 6 Cluster hardware configuration (interconnection)
    Table 6 Parameter description

    Parameter

    Description

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the node belongs.

    *Model

    Model of the extended device on the current node. The data is collected by privileged users.

    *Width

    Bit width of the extended device on the current node. The data is collected by privileged users.

    *Clock Frequency

    Clock frequency of the extended device on the current node. The data is collected by privileged users.

    *Capabilities

    Capabilities of the extended device on the current node. The data is collected by privileged users.

    *RAID Card Model

    RAID card model of the extended device on the current node. The data is collected by privileged users.

    *RAID Width

    RAID card bit width of the extended device on the current node. The data is collected by privileged users.

    *RAID Clock

    RAID card clock frequency of the extended device on the current node. The data is collected by privileged users.

    *Drive Name

    Drive name of the extended device on the current node. The data is collected by privileged users.

    Figure 7 Cluster software consistency distribution (OS)
    Table 7 Parameter description

    Parameter

    Description

    NUMA

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the current node belongs.

    Policy

    NUMA policy of the current node.

    Preferred Node

    Preferred NUMA node of the current node.

    Resource Restriction

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the current node belongs.

    Core File Size

    Size of the core file on the current node.

    Data Seg Size

    Maximum size of a data segment on the current node.

    Scheduling Priority

    Scheduling priority of the current node.

    File Size

    Maximum size of the file on the current node.

    Pending Signals

    Maximum number of pending signals on the current node.

    Max. Locked Memory

    Maximum locked memory of the current node.

    Max. Memory Size

    Maximum memory of the current node.

    Open Files

    Maximum number of files that can be opened on the current node.

    Pipe Size

    Maximum pipe size of the current node.

    POSIX Message Queues: includes the extra overhead of message queues

    POSIX message queue size of the current node, including the extra overhead of the message queue.

    Real-time Priority

    Real-time queue priority on the current node.

    Stack Size

    Maximum stack size of the current node.

    CPU Time

    CPU time limit of the current node.

    Max. User Processes

    Maximum number of user processes on the current node.

    Virtual Memory

    Maximum size of the virtual memory on the current node.

    File Locks

    Maximum number of file locks on the current node.

    Kernel Config

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the current node belongs.

    Parameter

    Kernel configuration parameters of the current node.

    Value

    Values of kernel configuration parameters of the current node.

    Figure 8 Cluster software consistency distribution (software package)
    Table 8 Parameter description

    Parameter

    Description

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the node belongs.

    Kernel

    Kernel version of the node.

    OS

    OS version of the node.

    DPC Client

    Number of DPC connections on the node.

    Donau Scheduler Agent

    Number of Donau Scheduler agent connections on the node.

    Mellanox Driver

    Mellanox driver version of the node.

    PostgreSQL

    PostgreSQL database version of the node.

    *BIOS

    BIOS version of the node. The data is collected by privileged users.

    Haveged

    Haveged version of the node.

    Figure 9 Cluster software consistency distribution (environment variables)
    Figure 10 Cluster software consistency distribution (dependency libraries)
    Figure 11 Cluster software consistency distribution (module)
    Table 9 Parameter description

    Parameter

    Description

    Environment Variable

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the current node belongs.

    LD_LIBRARY_PATH

    Value of the LD_LIBRARY_PATH environment variable of the node.

    INCLUDE

    Value of the INCLUDE environment variable of the node.

    LOADED_MODULES

    Value of the LOADED_MODULES environment variable of the node.

    Dependency Library

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the current node belongs.

    MPI Library

    MPI library used by the node.

    CUDA Library

    CUDA library used by the node.

    Module

    Node IP Address

    IP address of a node in the cluster.

    Data Group

    Group to which the current node belongs.

    Version

    Version of the module used on the node.

    List

    Details about the module used on the node.

    Figure 12 Cluster software consistency distribution (MPI/OpenMP)
  3. (Optional) Compare data by group.

    You can click to customize comparison groups. A maximum of 10 groups can be selected.

    Figure 13 Filtering groups for comparison
  4. (Optional) Filter node data.

    You can click Filter Node Data to view specific data or Clear Node Filter to view all data. Figure 14 shows an example.

    There are four filter options:

    Include all: nodes and data matching all filter items

    Include any: nodes and data matching one or more filter items

    Exclude all: excluding nodes and data matching all filter items

    Exclude any: excluding nodes and data matching any of the filter items

    Figure 14 Data filtering