CUDA Fortran

CUDA Fortran is a computing platform launched by NVIDIA. CUDA introduced by NVIDIA is a general-purpose parallel computing architecture that solves complex computational problems on a GPU. Since 2007, parallel computing based on accelerators represented by NVIDIA GPUs has started to emerge. Currently, the popular general-purpose programming languages for GPUs are CUDA C and OpenCL. Both of them are extensions of the C/C++ language, and therefore the C/C++ code can be easily ported to the GPU. However, Fortran, an important programming language in scientific and engineering computing, cannot be directly rewritten as CUDA C or OpenCL.

To enable Fortran applications to use GPU acceleration, The Portland Group designed CUDA Fortran. Codes of applications in fields such as meteorology and theoretical physics can be easily modified to utilize the powerful computing capabilities of GPUs. Only the PGI Fortran compiler supports the CUDA Fortran architecture.

module mathOps
contains
  attributes(global) subroutine saxpy(x, y, a)
    implicit none
    real :: x(:), y(:)
    real, value :: a
    integer :: i, n
    n = size(x)
    i = blockDim%x * (blockIdx%x - 1) + threadIdx%x
    if (i <= n) y(i) = y(i) + a*x(i)
  end subroutine saxpy 
end module mathOps

program testSaxpy
  use mathOps
  use cudafor
  implicit none
  integer, parameter :: N = 40000
  real :: x(N), y(N), a
  real, device :: x_d(N), y_d(N)
  type(dim3) :: grid, tBlock

  tBlock = dim3(256,1,1)
  grid = dim3(ceiling(real(N)/tBlock%x),1,1)

  x = 1.0; y = 2.0; a = 2.0
  x_d = x
  y_d = y
  call saxpy<<<grid, tBlock>>>(x_d, y_d, a)
  y = y_d
  write(*,*) 'Max error: ', maxval(abs(y-4.0))
end program testSaxpy

In the preceding code example, testSaxpy is a host function and calls saxpy, a GPU-side function. The execution of saxpy is offloaded to the GPU. x and y declared via real :: x(N), y(N) are host variables, while x_d and y_d declared via real, device :: x_d(N), y_d(N) are GPU-side variables.

CUDA Fortran enables data transfer by directly assigning values between host and GPU-side variables. For example, x_d = x assigns the value of the host variable x to the GPU-side variable x_d.

Parent topic: OpenMP/OpenACC Optimization