Rate This Document
Findability
Accuracy
Completeness
Readability

Running and Verification

Procedure

  1. Use PuTTY to log in to the server as the root user.
  2. Create a working directory.
    mkdir -p /path/to/CASE
  3. Go to the working directory, and copy the test cases and binary files to the working directory.
    cd /path/to/CASE
    cp /path/to/LAMMPS/lammps-5Jun19/bench/in.lj  ./
    cp /path/to/LAMMPS/lammps-5Jun19/src/lmp_mpi  ./
  4. Start the running.
    • Run single-node commands on CentOS.
      mpirun --allow-run-as-root -np 96 --mca btl ^openib  ./lmp_mpi -in in.lj >>test_OneNode.log
    • Run single-node commands on openEuler.
      mpirun --allow-run-as-root -np 96  -mca pml ucx -mca btl ^vader,tcp,openib,uct -x UCX_TLS=self,sm --bind-to core --map-by socket --rank-by core -x UCX_BUILTIN_BCAST_ALGORITHM=3 -x UCX_BUILTIN_BARRIER_ALGORITHM=5 -x UCX_BUILTIN_ALLREDUCE_ALGORITHM=10  ./lmp_mpi -in in.lj >> ./test_OneNode.log

      Output the result to the test_OneNode.log file in the current directory and check the value of Performance (unit: timesteps/s). A larger value indicates higher performance.

      The following is an example of the test result.

      Performance: 1134386.210 tau/day, 2625.894 timesteps/s
      99.7% CPU use with 96 MPI tasks x no OpenMP threads
      MPI task timing breakdown:
      Section |  min time  |  avg time  |  max time  |%varavg| %total
      ---------------------------------------------------------------
      Pair    | 0.021651   | 0.023624   | 0.025878   |   0.6 | 62.03
      Neigh   | 0.002734   | 0.0029155  | 0.0031491  |   0.2 |  7.66
      Comm    | 0.007533   | 0.010058   | 0.012192   |   1.1 | 26.41
      Output  | 5.6281e-05 | 0.00050681 | 0.00062442 |   0.0 |  1.33
      Modify  | 0.00056003 | 0.00061947 | 0.00070919 |   0.0 |  1.63
      Other   |            | 0.0003584  |            |       |  0.94
    • Run dual-node commands on CentOS.
      mpirun --allow-run-as-root -np 192 -N 96 -x PATH=$PATH -x LD_LIBRARY_PATH=$LD_LIBRARY_PATH  -machinefile machinefile --mca btl ^openib ./lmp_mpi -in in.lj >> ./test_TwoNodes.log

      Add the host names (for example, n1 and n2) of the two specified compute nodes to the machinefile file, as shown in the following figure:

    • Run dual-node commands on openEuler.
      mpirun --allow-run-as-root -np 192 -N 96 -machinefile machinefile  -x PATH=$PATH -x LD_LIBRARY_PATH=$LD_LIBRARY_PATH -mca pml ucx -mca btl ^vader,tcp,openib,uct  --bind-to core  --rank-by core ./lmp_mpi -in in.lj >> ./test_TwoNodes.log

      Add the host names (for example, n1 and n2) of the two specified compute nodes to the machinefile file, as shown in the following figure:

      Output the result to the test_TwoNodes.log file in the current directory and check the value of Performance (unit: timesteps/s). A larger value indicates higher performance.

      The following is an example of the test result.

      Performance: 1605508.300 tau/day, 3716.454 timesteps/s
      91.0% CPU use with 192 MPI tasks x no OpenMP threads
      MPI task timing breakdown:
      Section |  min time  |  avg time  |  max time  |%varavg| %total
      ---------------------------------------------------------------
      Pair    | 0.01035    | 0.01174    | 0.013347   |   0.6 | 43.63
      Neigh   | 0.0013662  | 0.0014931  | 0.0016205  |   0.1 |  5.55
      Comm    | 0.01112    | 0.012935   | 0.014469   |   0.6 | 48.07
      Output  | 7.3471e-05 | 0.00010074 | 0.00017455 |   0.0 |  0.37
      Modify  | 0.00023517 | 0.00032949 | 0.00040742 |   0.0 |  1.22
      Other   |            | 0.0003095  |            |       |  1.15
      Table 1 Parameter description

      Parameter

      Description

      -np

      Total number of running MPI processes.

      -N

      Number of processes running on each server.

      -machinefile

      Name of the node to be used.

      • The dual-node test cases run in the shared directory. If the PATH and LD_LIBRARY_PATH environment variables have been configured in Configuring the Compilation Environment, do not need to configure them again.
      • If hyper-threading is not enabled, the np value must be less than or equal to the number of nodes multiplied by the number of CPU cores on each node.
      • n1 and n2 are the host names.