NVCC Compilation Parameter Optimization

Table 1 lists the compilation options that greatly affect the GPU computing performance. You can add them during compilation for optimization.

**Table 1** NVCC compilation parameters
Compilation Option	Description
-gencode;arch=compute_xx,code=sm_xx	Specifies the GPU type and architecture to obtain better compatibility and performance. For example, the typical configuration of A100 is -gencode;arch=compute_80,code=sm_80.
--ftz=false/true	Indicates whether to set the minimum value to 0 to reduce calculation. The default value is false.
--prec-sqrt=true/false	Indicates whether to use the precise square root function. The default value is true.
--prec-div=true/false	Indicates whether to use precise division. The default value is true.
--fmad=true/false	Indicates whether to enable the fused multiply-add (FMA) operation. The default value is true.
--use_fast_math	Indicates whether to enable the fast calculation mode. If it is enabled, it is equivalent to setting --ftz=true, --prec-div=false, --prec-sqrt=false, and --fmad=true.
-O 0 1 2 3 4	Indicates the code optimization level. O0 indicates that no optimization is performed. The optimization operations increase from O1 to O4. O4 is recommended.
-Xptxas -allow-expensive-optimizations	Indicates optimization using the maximum resources.
-Xptxas -dlcm=ca/cg	Indicates whether to enable the L1 cache. ca indicates that the L1 cache is enabled, and cg indicates that the L1 cache is disabled. The default value is ca.

Parent topic: GPU Compilation Parameter Optimization