[1632387881.405868] [arm-node88:57923:0] mm_posix.c:194 UCX ERROR shm_open(file_name=/ucx_shm_posix_23f3f65f flags=0xc2) failed: Permission denied [1632387881.405910] [arm-node88:57923:0] uct_mem.c:132 UCX ERROR failed to allocate 8447 bytes using md posix for mm_recv_fifo: Shared memory error [1632387881.405917] [arm-node88:57923:0] mm_iface.c:605 UCX ERROR mm_iface failed to allocate receive FIFO [arm-node88:57923] coll_ucx_component.c:360 Warning: Failed to create UCG worker, automatically select other available and highest priority collective component. [1632387881.411347] [arm-node88:57923:0] mm_posix.c:194 UCX ERROR shm_open(file_name=/ucx_shm_posix_6ae5143e flags=0xc2) failed: Permission denied [1632387881.411359] [arm-node88:57923:0] uct_mem.c:132 UCX ERROR failed to allocate 8447 bytes using md posix for mm_recv_fifo: Shared memory error [1632387881.411366] [arm-node88:57923:0] mm_iface.c:605 UCX ERROR mm_iface failed to allocate receive FIFO [arm-node88:57923] pml_ucx.c:274 Error: Failed to create UCP worker [arm-node88:57923] *** An error occurred in MPI_Allreduce [arm-node88:57923] *** reported by process [878510081,70368744177671] [arm-node88:57923] *** on communicator MPI_COMM_WORLD [arm-node88:57923] *** MPI_ERR_INTERN: internal error [arm-node88:57923] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [arm-node88:57923] *** and potentially your MPI job)
多节点运行mpirun命令时,存在节点与节点间不能互相通讯。