NVCC Compilation Parameter Optimization
Table 1 lists the compilation options that greatly affect the GPU computing performance. You can add them during compilation for optimization.
Compilation Option |
Description |
|---|---|
-gencode;arch=compute_xx,code=sm_xx |
Specifies the GPU type and architecture to obtain better compatibility and performance. For example, the typical configuration of A100 is -gencode;arch=compute_80,code=sm_80. |
--ftz=false/true |
Indicates whether to set the minimum value to 0 to reduce calculation. The default value is false. |
--prec-sqrt=true/false |
Indicates whether to use the precise square root function. The default value is true. |
--prec-div=true/false |
Indicates whether to use precise division. The default value is true. |
--fmad=true/false |
Indicates whether to enable the fused multiply-add (FMA) operation. The default value is true. |
--use_fast_math |
Indicates whether to enable the fast calculation mode. If it is enabled, it is equivalent to setting --ftz=true, --prec-div=false, --prec-sqrt=false, and --fmad=true. |
-O 0 1 2 3 4 |
Indicates the code optimization level. O0 indicates that no optimization is performed. The optimization operations increase from O1 to O4. O4 is recommended. |
-Xptxas -allow-expensive-optimizations |
Indicates optimization using the maximum resources. |
-Xptxas -dlcm=ca/cg |
Indicates whether to enable the L1 cache. ca indicates that the L1 cache is enabled, and cg indicates that the L1 cache is disabled. The default value is ca. |