Rate This Document
Findability
Accuracy
Completeness
Readability

An Error Is Reported After a GID Is Specified in the -x UCX_IB_GID_INDEX Command

Symptom

An error is reported after an invalid GID is specified in -x UCX_IB_GID_INDEX.

ib_device.c:848 UCX ERROR ibv_query_gid(dev=mlx5_0 port=1 index=10) failed: No such file or directory

An error is reported after a GID is specified in -x UCX_IB_GID_INDEX.

pml_ucx.c:384  Error: ucp_ep_create(proc=1) failed: Destination is unreachable

Possible Causes

  • The GID specified in the CLI is different from the GID of the server.
  • The GIDs of multiple job execution nodes are inconsistent.

Procedure

  • The GID specified in the CLI is different from the GID of the server.
    1. Use PuTTY to log in to a job execution node as a common Hyper MPI user, for example, hmpi_user.
    2. Run the following command to query the GID value. mlx5_0 indicates the network device name, and 1 indicates the network device port. Change them based on the site requirements.

      cd /sys/class/infiniband/mlx5_0/ports/1

      grep -r 0000:0000:0000:0000:0000 gids/ | grep -v 0000:0000:0000:0000:0000:0000:0000:0000 | awk -F: '{print $1}' | awk -F/ '{print $2}' 2>/dev/null | xargs -i grep --with-filename -o "[Vv].*" "gid_attrs/types/"{}

      gid_attrs/types/5:v2

      5 indicates the queried GID value.

    3. Run the following command to set the GID value to 5:

      -x UCX_IB_GID_INDEX=5

  • The GIDs of multiple job execution nodes are inconsistent.
    1. Use PuTTY to log in to a job execution node as a common Hyper MPI user, for example, hmpi_user.
    2. Upload the edited show-gids script to the root directory of the job execution node.
      The content of the show-gids script is as follows:
      #!/bin/bash
      black='\E[30;50m'
      red='\E[31;50m'
      green='\E[32;50m'
      yellow='\E[33;50m'
      blue='\E[34;50m'
      magenta='\E[35;50m'
      cyan='\E[36;50m'
      white='\E[37;50m'
      bold='\033[1m'
      gid_count=0
      # cecho (color echo) prints text in color.
      # first parameter should be the desired color followed by text
      function cecho ()
      {
      echo -en $1
      shift
      echo -n $*
      tput sgr0
      }
      # becho (color echo) prints text in bold.
      becho ()
      {
      echo -en $bold
      echo -n $*
      tput sgr0
      }
      function print_gids()
      {
      dev=$1
      port=$2
      for gf in /sys/class/infiniband/$dev/ports/$port/gids/* ; do
      gid=$(cat $gf);
      if [ $gid = 0000:0000:0000:0000:0000:0000:0000:0000 ] ; then
      continue
      fi
      echo -e $(basename $gf) "\t" $gid
      done
      }
      echo -e "DEV\tPORT\tINDEX\tGID\t\t\t\t\tIPv4 \t\tVER\tDEV"
      echo -e "---\t----\t-----\t---\t\t\t\t\t------------ \t---\t---"
      DEVS=$1
      if [ -z "$DEVS" ] ; then
      DEVS=$(ls /sys/class/infiniband/)
      fi
      for d in $DEVS ; do
      for p in $(ls /sys/class/infiniband/$d/ports/) ; do
      for g in $(ls /sys/class/infiniband/$d/ports/$p/gids/) ; do
      gid=$(cat /sys/class/infiniband/$d/ports/$p/gids/$g);
      if [ $gid = 0000:0000:0000:0000:0000:0000:0000:0000 ] ; then
      continue
      fi
      if [ $gid = fe80:0000:0000:0000:0000:0000:0000:0000 ] ; then
      continue
      fi
      _ndev=$(cat /sys/class/infiniband/$d/ports/$p/gid_attrs/ndevs/$g 2>/dev/null)
      __type=$(cat /sys/class/infiniband/$d/ports/$p/gid_attrs/types/$g 2>/dev/null)
      _type=$(echo $__type| grep -o "[Vv].*")
      if [ $(echo $gid | cut -d ":" -f -1) = "0000" ] ; then
      ipv4=$(printf "%d.%d.%d.%d" 0x${gid:30:2} 0x${gid:32:2} 0x${gid:35:2} 0x${gid:37:2})
      echo -e "$d\t$p\t$g\t$gid\t$ipv4 \t$_type\t$_ndev"
      else
      echo -e "$d\t$p\t$g\t$gid\t\t\t$_type\t$_ndev"
      fi
      gid_count=$(expr 1 + $gid_count)
      done #g (gid)
      done #p (port)
      done #d (dev)
      echo n_gids_found=$gid_count
    3. Run the following command to modify the ~/.bashrc file.
      1. Open the ~/.bashrc file.

        vi ~/.bashrc

      2. Press i to enter the insert mode and add the following content:
        v2_gid=$(./show-gids | grep -E "((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])" | grep v2 | awk '{print $3}')
        export UCX_IB_GID_INDEX=$v2_gid
      3. Press Esc, type :wq!, and press Enter to save the settings and exit.
    4. Run the following command for the settings to take effect:

      source ~/.bashrc