Specifying Algorithms
Allreduce Algorithms
- When using algorithms 13 and 14, ensure that the number of processes is evenly distributed on sockets. Add the following options:
- When algorithm 5 or 6 is selected, the K value parameter for adjusting the K-nomial tree structure needs to be introduced because the K-nomial algorithm is used in the node. The following commands are examples:
-x UCG_PLANC_UCX_ALLREDUCE_FANOUT_INTRA_DEGREE=3
-x UCG_PLANC_UCX_ALLREDUCE_FANIN_INTRA_DEGREE=8
- When algorithm 7 or 8 is used, the K value parameter for adjusting the K-nomial tree structure needs to be introduced because the K-nomial algorithm is used within a node and between nodes. The following commands are examples:
-x UCG_PLANC_UCX_ALLREDUCE_FANOUT_INTER_DEGREE=7 -x UCG_PLANC_UCX_ALLREDUCE_FANIN_INTER_DEGREE=7
-x UCG_PLANC_UCX_ALLREDUCE_FANOUT_INTRA_DEGREE=3 -x UCG_PLANC_UCX_ALLREDUCE_FANIN_INTRA_DEGREE=8
To improve performance, you can also add the following option to the command:
-x UCX_TLS=sm,rc_x
The following is an example of the command used when MPI_Allreduce and MPI_Iallreduce are called (Kunpeng processor):
mpirun -np 16 -N 2 --hostfile hf8 --mca btl ^vader,tcp,openib -x UCX_TLS=sm,rc_x -x UCG_PLANC_UCX_ALLREDUCE_ATTR=I:nS:200R:0- test_case
Bcast Algorithms
- When algorithm 3 is selected, the parameter for adjusting the K value needs to be added because the K-nomial algorithm is used between nodes. An example command is as follows:
-x UCG_PLANC_UCX_BCAST_NA_KNTREE_INTER_DEGREE=7
- When algorithm 4 is selected, two parameters for adjusting the K value need to be added because the K-nomial algorithm is used within a node and between nodes. The following commands are examples:
-x UCG_PLANC_UCX_BCAST_NA_KNTREE_INTER_DEGREE =7
-x UCG_PLANC_UCX_BCAST_NA_KNTREE_INTRA_DEGREE=3
- When algorithm 9 is used, you can set the following parameter to adjust the number of blocks divided in the ESBT algorithm to find the optimal parameter value to achieve the optimal performance. The default value 0 indicates auto. An example command is as follows:
-x UCG_PLANC_UCX_BCAST_ESBT_BLOCKS=3
To improve performance, you can also add the following option to the command:
-x UCX_TLS=sm,rc_x
The following is an example of the command used when MPI_Bcast and MPI_Ibcast are called (Kunpeng processor):
mpirun -np 16 -N 2 --hostfile hf8 --mca btl ^vader,tcp,openib -x UCX_TLS=sm,rc_x -x UCG_PLANC_UCX_BCAST_ATTR=I:nS:200R:0- test_case
Barrier Algorithms
As listed in Table 3, Barrier algorithms are a subset of Allreduce algorithms. For details about how to specify Barrier algorithms, see the description of specifying Allreduce algorithms.
Scatterv Non-Blocking API Algorithms
- When algorithm 1 is used, you can adjust the values of the following parameters to find the optimal parameter values to achieve the optimal performance. An example command is as follows:
-x UCG_PLANC_UCX_SCATTERV_MIN_SEND_BATCH=7
-x UCG_PLANC_UCX_SCATTERV_MAX_SEND_BATCH=8
- When algorithm 2 is used, that is, the K-nomial algorithm is used, you can adjust the K value to find the optimal parameter value to achieve the optimal performance. An example command is as follows:
To improve performance, you can also add the following option to the command:
-x UCX_TLS=sm,rc_x
The following is an example of the command used when MPI_Iscatterv is called (Kunpeng processor):
mpirun -np 16 -N 2 --hostfile hf8 --mca btl ^vader,tcp,openib -x UCX_TLS=sm,rc_x -x UCG_PLANC_UCX_SCATTERV_ATTR=I:nS:200R:0- test_case
Allgatherv Algorithms
To improve performance, you can also add the following option to the command:
-x UCX_TLS=sm,rc_x
The following is an example of the command used when MPI_Allgatherv and MPI_Iallgatherv are called (Kunpeng processor):
mpirun -np 16 -N 2 --hostfile hf8 --mca btl ^vader,tcp,openib -x UCX_TLS=sm,rc_x -x UCG_PLANC_UCX_ALLGATHERV_ATTR=I:nS:200R:0- test_case