Network Device Memory Cannot Be Allocated During MPI Job Execution
Symptom
During execution of an MPI job, a message is displayed indicating that the network device memory cannot be allocated.
ERROR ibv_open_device(mlx5_0) failed: Cannot allocate memory
Possible Causes
The memory of the node where the MPI job is executed is insufficient. As a result, the system displays a message indicating that the network device memory cannot be allocated.
Procedure
- Use PuTTY to log in to a job execution node as a Hyper MPI common user, for example, hmpi_user.
- Run the following command to check the running processes on the job execution node:
top
- Stop the processes that are not related to MPI job execution.
Parent topic: FAQ