Rate This Document
Findability
Accuracy
Completeness
Readability

NVCC Compilation Parameter Optimization

Table 1 lists the compilation options that greatly affect the GPU computing performance. You can add them during compilation for optimization.

Table 1 NVCC compilation parameters

Compilation Option

Description

-gencode;arch=compute_xx,code=sm_xx

Specifies the GPU type and architecture to obtain better compatibility and performance. For example, the typical configuration of A100 is -gencode;arch=compute_80,code=sm_80.

--ftz=false/true

Indicates whether to set the minimum value to 0 to reduce calculation. The default value is false.

--prec-sqrt=true/false

Indicates whether to use the precise square root function. The default value is true.

--prec-div=true/false

Indicates whether to use precise division. The default value is true.

--fmad=true/false

Indicates whether to enable the fused multiply-add (FMA) operation. The default value is true.

--use_fast_math

Indicates whether to enable the fast calculation mode. If it is enabled, it is equivalent to setting --ftz=true, --prec-div=false, --prec-sqrt=false, and --fmad=true.

-O 0 1 2 3 4

Indicates the code optimization level. O0 indicates that no optimization is performed. The optimization operations increase from O1 to O4. O4 is recommended.

-Xptxas -allow-expensive-optimizations

Indicates optimization using the maximum resources.

-Xptxas -dlcm=ca/cg

Indicates whether to enable the L1 cache. ca indicates that the L1 cache is enabled, and cg indicates that the L1 cache is disabled. The default value is ca.

For details, see NVIDIA HPC Compilers Reference Guide.