Compilation and Installation

Procedure

Use PuTTY to log in to the server as the root user.

Install the Docker toolkit.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
yum-config-manager --add-repo https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo
dnf clean expire-cache --refresh
dnf install -y nvidia-docker2
systemctl restart docker

Obtain the base image.

docker pull nvidia/cuda-arm64:11.1-devel-centos8

Run Docker.

docker run -it nvidia/cuda-arm64:11.1-devel-centos8 /bin/bash

Modify the /etc/yum.repos.d/nvidia-ml.repo file.

Open the /etc/yum.repos.d/nvidia-ml.repo file.
```
vi /etc/yum.repos.d/nvidia-ml.repo
```

Press i to enter the edit mode and add the following information in bold:

[nvidia-ml]
name=nvidia-ml
baseurl=https://developer.download.nvidia.com/compute/machine-learning/repos/rhel8/sbsa
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-NVIDIA
sslverify=false

Press Esc, type :wq!, and press Enter to save the settings and exit.

Modify the /etc/yum.repos.d/cuda.repo file.
1. Open the /etc/yum.repos.d/cuda.repo file.
```
vi /etc/yum.repos.d/cuda.repo
```
2. Press i to enter the edit mode and add the following information in bold:
```
[cuda]
name=cuda
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/sbsa
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-NVIDIA
sslverify=false
```
3. Press Esc, type :wq!, and press Enter to save the settings and exit.
Refresh the Yum source cache.
```
yum makecache
```

Install the system dependencies.

yum install -y zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel python3 python3-devel gcc wget cmake3 libarchive git which

Download and install cudnn8.

wget https://developer.download.nvidia.cn/compute/machine-learning/repos/rhel8/sbsa/libcudnn8-8.0.4.30-1.cuda11.1.aarch64.rpm --no-check-certificate
rpm -ivh libcudnn8-8.0.4.30-1.cuda11.1.aarch64.rpm
wget https://developer.download.nvidia.cn/compute/machine-learning/repos/rhel8/sbsa/libcudnn8-devel-8.0.4.30-1.cuda11.1.aarch64.rpm --no-check-certificate
rpm –ivh libcudnn8-devel-8.0.4.30-1.cuda11.1.aarch64.rpm

Deploy Anaconda.

When executing the ./Anaconda3-2021.05-Linux-aarch64.sh script, you need to type Enter and yes. The script is deployed in the /root/anaconda3 directory by default.

wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-aarch64.sh
chmod +x Anaconda3-2021.05-Linux-aarch64.sh
./Anaconda3-2021.05-Linux-aarch64.sh
export PATH=/root/anaconda3/bin/:$PATH

Install the PyTorch dependencies.

conda install astunparse numpy ninja pyyaml setuptools cmake cffi typing_extensions future six requests dataclasses

(Optional) Configure the conda proxy as follows:

cat > /root/.condarcexport PATH=/root/anaconda3/bin:$PATH
channels:
  - conda-forge
  - defaults
proxy_servers:
  http: xxxxxx
  https: xxxxxx
ssl_verify: false

Set the environment variables.

export PATH=/root/anaconda3/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Download and compile the PyTorch source code.

export GIT_SSL_NO_VERIFY=1
git clone --recursive https://github.com/pytorch/pytorch  --depth=1
cd pytorch
git submodule sync
git submodule update --init --recursive --jobs 0
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install

Create another terminal shell to export the container.
```
docker ps
```
Move large files such as the PyTorch source code out of the container, delete the files, and export the image.
1. Back up the source code.
```
docker cp 4ad81495b7ef:/pytorch .
```
2. Delete the PyTorch source code from the container.
```
rm -rf pytorch/
rm -rf Anaconda3-2021.05-Linux-aarch64.sh
rm -f libcudnn8-8.0.4.30-1.cuda11.1.aarch64.rp
rm -f libcudnn8-devel-8.0.4.30-1.cuda11.1.aarch64.rpm
```
3. Export the image.
```
docker export -o pytorch_cuda.tar 4ad81495b7ef
```

Parent topic: PyTorch 1.9 Porting Guide (CentOS 8.2)