Obtaining the DLRM Source Code
The DLRM training uses the source code in the NodLabs open source repository. After obtaining the source code, apply the DLRM training patch file to adapt to the Kunpeng platform.
- Go to the planned DLRM source code path /path/to/dlrm.
cd /path/to/dlrm
- Configure a git network proxy.
git config --global http.sslVerify false git config --global https.sslverify false git config --global http.proxy "http://Username:Password@Proxy_IP_address:Proxy_port"
- Download the tensorflow-dlrm source code.
git clone https://github.com/NodLabs/tensorflow-dlrm.git
- Go to the source code directory.
cd tensorflow-dlrm
- Create and compile the DLRM training patch file train.patch.
- Create a train.patch file.
vi train.patch
- Press i to enter the insert mode and add the following content to the train.patch file:
diff --git a/dlrm_criteo_gpu.py b/dlrm_criteo_gpu.py index c2dfeac..0a71668 100644 --- a/dlrm_criteo_gpu.py +++ b/dlrm_criteo_gpu.py @@ -5,7 +5,7 @@ from tqdm import tqdm import tensorflow as tf import dataloader -raw_data = dataloader.load_criteo('../dataset/') +raw_data = dataloader.load_criteo('../../dataset/') dim_embed = 4 bottom_mlp_size = [8, 4] top_mlp_size = [128, 64, 1] @@ -71,4 +71,4 @@ for train_iter, batch_data in enumerate(train_dataset): average_loss.reset_states() auc.reset_states() -dlrm_model.save('DLRMModel_tf2_2') +dlrm_model.save_weights('mymodel') - Press Esc, type :wq, and press Enter to save the file and exit.
- Create a train.patch file.
- Apply the patch.
git apply train.patch
If the command output does not contain error information, the patch is successfully applied.
- Verify the patch integrity.
git diff --stat

If the command output shows that the number of modified lines in the file is the same as that shown in the preceding figure, the patch is correctly applied.
Parent topic: Training the DLRM