CUDA Fortran
CUDA Fortran is a computing platform launched by NVIDIA. CUDA introduced by NVIDIA is a general-purpose parallel computing architecture that solves complex computational problems on a GPU. Since 2007, parallel computing based on accelerators represented by NVIDIA GPUs has started to emerge. Currently, the popular general-purpose programming languages for GPUs are CUDA C and OpenCL. Both of them are extensions of the C/C++ language, and therefore the C/C++ code can be easily ported to the GPU. However, Fortran, an important programming language in scientific and engineering computing, cannot be directly rewritten as CUDA C or OpenCL.
To enable Fortran applications to use GPU acceleration, The Portland Group designed CUDA Fortran. Codes of applications in fields such as meteorology and theoretical physics can be easily modified to utilize the powerful computing capabilities of GPUs. Only the PGI Fortran compiler supports the CUDA Fortran architecture.
module mathOps
contains
attributes(global) subroutine saxpy(x, y, a)
implicit none
real :: x(:), y(:)
real, value :: a
integer :: i, n
n = size(x)
i = blockDim%x * (blockIdx%x - 1) + threadIdx%x
if (i <= n) y(i) = y(i) + a*x(i)
end subroutine saxpy
end module mathOps
program testSaxpy
use mathOps
use cudafor
implicit none
integer, parameter :: N = 40000
real :: x(N), y(N), a
real, device :: x_d(N), y_d(N)
type(dim3) :: grid, tBlock
tBlock = dim3(256,1,1)
grid = dim3(ceiling(real(N)/tBlock%x),1,1)
x = 1.0; y = 2.0; a = 2.0
x_d = x
y_d = y
call saxpy<<<grid, tBlock>>>(x_d, y_d, a)
y = y_d
write(*,*) 'Max error: ', maxval(abs(y-4.0))
end program testSaxpy
In the preceding code example, testSaxpy is a host function and calls saxpy, a GPU-side function. The execution of saxpy is offloaded to the GPU. x and y declared via real :: x(N), y(N) are host variables, while x_d and y_d declared via real, device :: x_d(N), y_d(N) are GPU-side variables.
CUDA Fortran enables data transfer by directly assigning values between host and GPU-side variables. For example, x_d = x assigns the value of the host variable x to the GPU-side variable x_d.