Creating an HPC Cluster Check Task
Function
The tool checks the consistency of software and hardware configurations between nodes in an HPC physical machine or VM cluster and provides tuning suggestions.
Prerequisites
- No nodes are in the Offline state.
- This task is not supported in the container environment.
Procedure
- Click
next to System Profiler.Choose General Analysis from the drop-down list. The page for creating a task is displayed.
- Set task parameters by referring to "Task Management" and Table 1.
See Figure 1.
Table 1 Parameters for creating an HPC cluster check task Parameter
Description
mpirun Running Node
MPI node to be checked.
Shared Directory
Shared directory for analysis.
Collect Privileged Metrics
Indicates whether to collect data that can only be collected by privileged users. This function is disabled by default. If the mpirun running user is a non-privileged user, configure sudo permissions for all nodes to run the commands that only privileged users can run.
mpirun Running User
Name of the user who runs the mpirun application.
NOTE:Performing all operations as the root user during the collection may cause risks. You are advised to perform operations as a common user.
Password
Password of the mpirun running user.
mpirun Path
Path of the mpirun application.
(Optional) mpirun Parameter
Parameters for running the mpirun application. Specify the task node (for example, --hostfile) in the parameters. If only the -np parameter is entered, the task will be randomly sent to a node. As a result, the cluster and MPI operating environment are inconsistent, and the task fails.
(Optional) hostfile
Hosts file, which must be in text format and no larger than 10 MB. You can download the template for reference.
(Optional) Environment Variable File
Environment variable file, which must be in text format and no larger than 10 MB. You can download the template for reference.
- Click OK.
You can click the icons next to the task name to perform the following operations:
: cancels the analysis task. After an analysis task is canceled, the collected information will be deleted.
: restarts the analysis task. You can modify task parameter settings and restart an analysis task. This button is available when a task is canceled or fails.
: deletes the analysis task. After a task is deleted, all data of this task will be deleted. Exercise caution when performing this operation.
: performs the analysis again. The analysis task is renamed and restarted.
: changes the task or report name. The report naming rule is the same as that of a task.
