Rate This Document
Findability
Accuracy
Completeness
Readability

Network Device Memory Cannot Be Allocated During MPI Job Execution

Symptom

During execution of an MPI job, a message is displayed indicating that the network device memory cannot be allocated.

ERROR ibv_open_device(mlx5_0) failed: Cannot allocate memory

Possible Causes

The memory of the node where the MPI job is executed is insufficient. As a result, the system displays a message indicating that the network device memory cannot be allocated.

Procedure

  1. Use PuTTY to log in to a job execution node as a Hyper MPI common user, for example, hmpi_user.
  2. Run the following command to check the running processes on the job execution node:

    top

  3. Stop the processes that are not related to MPI job execution.