Rate This Document
Findability
Accuracy
Completeness
Readability

Creating an HPC Cluster Check Task

Function

The tool checks the consistency of software and hardware configurations between nodes in an HPC physical machine or VM cluster and provides tuning suggestions.

Prerequisites

  • No nodes are in the Offline state.
  • This task is not supported in the container environment.

Procedure

  1. Click next to System Profiler.

    Choose General Analysis from the drop-down list. The page for creating a task is displayed.

  2. Set task parameters by referring to "Task Management" and Table 1.

    See Figure 1.

    Figure 1 Creating an HPC cluster check task
    Table 1 Parameters for creating an HPC cluster check task

    Parameter

    Description

    mpirun Running Node

    MPI node to be checked.

    Shared Directory

    Shared directory for analysis.

    Collect Privileged Metrics

    Indicates whether to collect data that can only be collected by privileged users. This function is disabled by default. If the mpirun running user is a non-privileged user, configure sudo permissions for all nodes to run the commands that only privileged users can run.

    mpirun Running User

    Name of the user who runs the mpirun application.

    NOTE:

    Performing all operations as the root user during the collection may cause risks. You are advised to perform operations as a common user.

    Password

    Password of the mpirun running user.

    mpirun Path

    Path of the mpirun application.

    (Optional) mpirun Parameter

    Parameters for running the mpirun application. Specify the task node (for example, --hostfile) in the parameters. If only the -np parameter is entered, the task will be randomly sent to a node. As a result, the cluster and MPI operating environment are inconsistent, and the task fails.

    (Optional) hostfile

    Hosts file, which must be in text format and no larger than 10 MB. You can download the template for reference.

    (Optional) Environment Variable File

    Environment variable file, which must be in text format and no larger than 10 MB. You can download the template for reference.

  3. Click OK.

    You can click the icons next to the task name to perform the following operations:

    • : cancels the analysis task. After an analysis task is canceled, the collected information will be deleted.
    • : restarts the analysis task. You can modify task parameter settings and restart an analysis task. This button is available when a task is canceled or fails.
    • : deletes the analysis task. After a task is deleted, all data of this task will be deleted. Exercise caution when performing this operation.
    • : performs the analysis again. The analysis task is renamed and restarted.
    • : changes the task or report name. The report naming rule is the same as that of a task.