命令示例
Hyper MPI使用MPI_Allreduce算法6、MPI_Barrier算法5和MPI_Bcast算法3能获得较优的性能指标。
- IB网络环境命令示例
mpirun -np 384 -N 48 --hostfile hf --bind-to core --map-by socket --rank-by core --mca btl ^vader,tcp,openib -x UCX_TLS=sm,ud_x -x UCX_NET_DEVICES=mlx5_0:1 -x UCG_PLANC_UCX_ALLREDUCE_ATTR=I:6S:200R:0- -x UCG_PLANC_UCX_BARRIER_ATTR=I:5S:200R:0 -x UCG_PLANC_UCX_BCAST_ATTR=I:3S:200R:0- -x UCG_PLANC_UCX_ALLREDUCE_FANOUT_INTRA_DEGREE=3 -x UCG_PLANC_UCX_ALLREDUCE_FANIN_INTRA_DEGREE=2 -x UCG_PLANC_UCX_ALLREDUCE_FANOUT_INTER_DEGREE=7 -x UCG_PLANC_UCX_ALLREDUCE_FANIN_INTER_DEGREE=7 -x UCG_PLANC_UCX_BARRIER_FANOUT_INTRA_DEGREE=3 -x UCG_PLANC_UCX_BARRIER_FANIN_INTRA_DEGREE=2 -x UCG_PLANC_UCX_BARRIER_FANOUT_INTER_DEGREE=7 -x UCG_PLANC_UCX_BARRIER_FANIN_INTER_DEGREE=7 test_case
- RoCE网络环境命令示例
mpirun -np 384 -N 48 --hostfile hf --bind-to core --map-by socket --rank-by core --mca btl ^vader,tcp,openib -x UCX_TLS=sm,ud -x UCX_NET_DEVICES=mlx5_1:1 -x UCG_PLANC_UCX_ALLREDUCE_ATTR=I:6S:200R:0- -x UCG_PLANC_UCX_BARRIER_ATTR=I:5S:200R:0 -x UCG_PLANC_UCX_BCAST_ATTR=I:3S:200R:0- -x UCG_PLANC_UCX_ALLREDUCE_FANOUT_INTRA_DEGREE=3 -x UCG_PLANC_UCX_ALLREDUCE_FANIN_INTRA_DEGREE=2 -x UCG_PLANC_UCX_ALLREDUCE_FANOUT_INTER_DEGREE=7 -x UCG_PLANC_UCX_ALLREDUCE_FANIN_INTER_DEGREE=7 -x UCG_PLANC_UCX_BARRIER_FANOUT_INTRA_DEGREE=3 -x UCG_PLANC_UCX_BARRIER_FANIN_INTRA_DEGREE=2 -x UCG_PLANC_UCX_BARRIER_FANOUT_INTER_DEGREE=7 -x UCG_PLANC_UCX_BARRIER_FANIN_INTER_DEGREE=7 test_case
IB和RoCE两种网络环境均使用鲲鹏服务器架构和Mellanox类型的网卡。