Rate This Document
Findability
Accuracy
Completeness
Readability

Compilation and Installation

Procedure

  1. Use PuTTY to log in to the server as the root user.
  2. Install the Docker toolkit.
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    yum-config-manager --add-repo https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo
    dnf clean expire-cache --refresh
    dnf install -y nvidia-docker2
    systemctl restart docker
  3. Obtain the base image.
    docker pull nvidia/cuda-arm64:11.1-devel-centos8
  4. Run Docker.
    docker run -it nvidia/cuda-arm64:11.1-devel-centos8 /bin/bash
  5. Modify the /etc/yum.repos.d/nvidia-ml.repo file.
    1. Open the /etc/yum.repos.d/nvidia-ml.repo file.
      vi /etc/yum.repos.d/nvidia-ml.repo
    2. Press i to enter the edit mode and add the following information in bold:
      [nvidia-ml]
      name=nvidia-ml
      baseurl=https://developer.download.nvidia.com/compute/machine-learning/repos/rhel8/sbsa
      enabled=1
      gpgcheck=1
      gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-NVIDIA
      sslverify=false
    3. Press Esc, type :wq!, and press Enter to save the settings and exit.
  6. Modify the /etc/yum.repos.d/cuda.repo file.
    1. Open the /etc/yum.repos.d/cuda.repo file.
      vi /etc/yum.repos.d/cuda.repo
    2. Press i to enter the edit mode and add the following information in bold:
      [cuda]
      name=cuda
      baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/sbsa
      enabled=1
      gpgcheck=1
      gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-NVIDIA
      sslverify=false
    3. Press Esc, type :wq!, and press Enter to save the settings and exit.
  7. Refresh the Yum source cache.
    yum makecache
  8. Install the system dependencies.
    yum install -y zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel python3 python3-devel gcc wget cmake3 libarchive git which
  9. Download and install cudnn8.
    wget https://developer.download.nvidia.cn/compute/machine-learning/repos/rhel8/sbsa/libcudnn8-8.0.4.30-1.cuda11.1.aarch64.rpm --no-check-certificate
    rpm -ivh libcudnn8-8.0.4.30-1.cuda11.1.aarch64.rpm
    wget https://developer.download.nvidia.cn/compute/machine-learning/repos/rhel8/sbsa/libcudnn8-devel-8.0.4.30-1.cuda11.1.aarch64.rpm --no-check-certificate
    rpm –ivh libcudnn8-devel-8.0.4.30-1.cuda11.1.aarch64.rpm
  10. Deploy Anaconda.

    When executing the ./Anaconda3-2021.05-Linux-aarch64.sh script, you need to type Enter and yes. The script is deployed in the /root/anaconda3 directory by default.

    wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-aarch64.sh
    chmod +x Anaconda3-2021.05-Linux-aarch64.sh
    ./Anaconda3-2021.05-Linux-aarch64.sh
    export PATH=/root/anaconda3/bin/:$PATH
  11. Install the PyTorch dependencies.
    conda install astunparse numpy ninja pyyaml setuptools cmake cffi typing_extensions future six requests dataclasses

    (Optional) Configure the conda proxy as follows:

    cat > /root/.condarcexport PATH=/root/anaconda3/bin:$PATH
    channels:
      - conda-forge
      - defaults
    proxy_servers:
      http: xxxxxx
      https: xxxxxx
    ssl_verify: false
  12. Set the environment variables.
    export PATH=/root/anaconda3/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64
  13. Download and compile the PyTorch source code.
    export GIT_SSL_NO_VERIFY=1
    git clone --recursive https://github.com/pytorch/pytorch  --depth=1
    cd pytorch
    git submodule sync
    git submodule update --init --recursive --jobs 0
    export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
    python setup.py install
  14. Create another terminal shell to export the container.
    docker ps

    Move large files such as the PyTorch source code out of the container, delete the files, and export the image.

    1. Back up the source code.
      docker cp 4ad81495b7ef:/pytorch .
    2. Delete the PyTorch source code from the container.
      rm -rf pytorch/
      rm -rf Anaconda3-2021.05-Linux-aarch64.sh
      rm -f libcudnn8-8.0.4.30-1.cuda11.1.aarch64.rp
      rm -f libcudnn8-devel-8.0.4.30-1.cuda11.1.aarch64.rpm
    3. Export the image.
      docker export -o pytorch_cuda.tar 4ad81495b7ef