Command Examples
MPI_Allreduce algorithm 6, MPI_Barrier algorithm 5, and MPI_Bcast algorithm 3 of Hyper MPI can deliver better performance.
- Example of the command in the IB network environment:
mpirun -np 384 -N 48 --hostfile hf --bind-to core --map-by socket --rank-by core --mca btl ^vader,tcp,openib -x UCX_TLS=sm,ud_x -x UCX_NET_DEVICES=mlx5_0:1 -x UCG_PLANC_UCX_ALLREDUCE_ATTR=I:6S:200R:0- -x UCG_PLANC_UCX_BARRIER_ATTR=I:5S:200R:0 -x UCG_PLANC_UCX_BCAST_ATTR=I:3S:200R:0- -x UCG_PLANC_UCX_ALLREDUCE_FANOUT_INTRA_DEGREE=3 -x UCG_PLANC_UCX_ALLREDUCE_FANIN_INTRA_DEGREE=2 -x UCG_PLANC_UCX_ALLREDUCE_FANOUT_INTER_DEGREE=7 -x UCG_PLANC_UCX_ALLREDUCE_FANIN_INTER_DEGREE=7 -x UCG_PLANC_UCX_BARRIER_FANOUT_INTRA_DEGREE=3 -x UCG_PLANC_UCX_BARRIER_FANIN_INTRA_DEGREE=2 -x UCG_PLANC_UCX_BARRIER_FANOUT_INTER_DEGREE=7 -x UCG_PLANC_UCX_BARRIER_FANIN_INTER_DEGREE=7 test_case
- Example of the command in the RoCE network environment:
mpirun -np 384 -N 48 --hostfile hf --bind-to core --map-by socket --rank-by core --mca btl ^vader,tcp,openib -x UCX_TLS=sm,ud -x UCX_NET_DEVICES=mlx5_1:1 -x UCG_PLANC_UCX_ALLREDUCE_ATTR=I:6S:200R:0- -x UCG_PLANC_UCX_BARRIER_ATTR=I:5S:200R:0 -x UCG_PLANC_UCX_BCAST_ATTR=I:3S:200R:0- -x UCG_PLANC_UCX_ALLREDUCE_FANOUT_INTRA_DEGREE=3 -x UCG_PLANC_UCX_ALLREDUCE_FANIN_INTRA_DEGREE=2 -x UCG_PLANC_UCX_ALLREDUCE_FANOUT_INTER_DEGREE=7 -x UCG_PLANC_UCX_ALLREDUCE_FANIN_INTER_DEGREE=7 -x UCG_PLANC_UCX_BARRIER_FANOUT_INTRA_DEGREE=3 -x UCG_PLANC_UCX_BARRIER_FANIN_INTRA_DEGREE=2 -x UCG_PLANC_UCX_BARRIER_FANOUT_INTER_DEGREE=7 -x UCG_PLANC_UCX_BARRIER_FANIN_INTER_DEGREE=7 test_case
Both the IB and RoCE networks use the Kunpeng server architecture and Mellanox NIC.