cuFFT Usage
cuFFT is the API of the CUDA Fast Fourier Transform (FFT) library. It consists of two separate libraries: cuFFT and cuFFTW. The cuFFT library can maximize the performance of NVIDIA GPUs, and the cuFFTW library allows users to quickly use the FFTW algorithm on NVIDIA GPUs. The FFT is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of complex or real-valued datasets. It is one of the most important and widely used algorithms in physics and general signal processing. The cuFFT library provides a simple FFT algorithm interface, which allows users to quickly leverage the GPU's floating-point power and parallelism to accelerate FFT calculation.
cuFFT API Features
- Algorithms highly optimized for input sizes that can be written in the form
. In general, the smaller the prime factor, the better the performance. For example, powers of two are fastest. - An
algorithm for every input data size. - Half-precision (16-bit floating point), single-precision (32-bit floating point) and double-precision (64-bit floating point). Transforms of higher precision have higher performance.
- Complex and real-valued input and output. Real-valued inputs or outputs require less computations and data than complex values and often have faster time to solution. Types supported are:
- C2C: complex input to complex output
- R2C: real-valued input to complex output
- C2R: complex input to real-valued output
- 1D, 2D, and 3D FFTs.
- Execution of multiple 1D, 2D and 3D transforms simultaneously. These batched transforms have higher performance than single transforms.
- Execution of algorithms that do not require extra memory space.
- Arbitrary intra- and inter-dimension element strides.
- FFTW compatible data layout.
- Execution of FFTs across multiple GPUs.
- Streamed execution, enabling asynchronous computation and data movement.
cuFFT Library Usage
- Include the header file in the code.
- inc/cufft.h: cuFFT library (libcufft.so)
- inc/cufftXt.h: cuFFT library (libcufft.so) with Xt functionality
- inc/cufftw.h: cuFFTW library (libcufftw.so)
- Link the dynamic library using -lcufft during compilation.
Sample code: one-dimensional complex-to-complex Fourier transform, followed by an inverse transform in the frequency domain
#define NX 256
#define BATCH 1
cufftHandle plan;
cufftComplex *data;
cudaMalloc((void**)&data, sizeof(cufftComplex)*NX*BATCH);
if (cudaGetLastError() != cudaSuccess){
fprintf(stderr, "Cuda error: Failed to allocate\n");
return;
}
if (cufftPlan1d(&plan, NX, CUFFT_C2C, BATCH) != CUFFT_SUCCESS){
fprintf(stderr, "CUFFT error: Plan creation failed");
return;
}
...
/* Note:
* Identical pointers to input and output arrays implies in-place transformation
*/
if (cufftExecC2C(plan, data, data, CUFFT_FORWARD) != CUFFT_SUCCESS){
fprintf(stderr, "CUFFT error: ExecC2C Forward failed");
return;
}
if (cufftExecC2C(plan, data, data, CUFFT_INVERSE) != CUFFT_SUCCESS){
fprintf(stderr, "CUFFT error: ExecC2C Inverse failed");
return;
}
/*
* Results may not be immediately available so block device until all
* tasks have completed
*/
if (cudaDeviceSynchronize() != cudaSuccess){
fprintf(stderr, "Cuda error: Failed to synchronize\n");
return;
}
/*
* Divide by number of elements in data set to get back original data
*/
...
cufftDestroy(plan);
cudaFree(data);