Rate This Document
Findability
Accuracy
Completeness
Readability

Tuning the Parallel I/Os

Principle

An MPI program has three file read and write operation modes.

  • Only one process is used for the read and write operations: One process (P0) reads all data in files into its buffer, and then transfers most data to other processes by using MPI send and receive functions. After the calculation is complete, other processes send the calculation result to process P0. P0 writes all data results to the files. In this mode, the process that performs the file read and write operations is the performance bottleneck. The read and write bandwidth is limited by the network bandwidth of the computing server where P0 resides and the performance upper limit of a single process in the storage system.
  • Read and write operations are performed by multiple processes: Each process operates only its own files and is independent of each other. In this mode, multiple network channels of a computing server can be used at the same time, and the multi-client access capability of a parallel storage system can be fully utilized. The disadvantage is that the number of source data files to be read may be less than the number of processes. As a result, the load is unbalanced, the output file data is too large, and the subsequent operations are hard to perform.
  • Multiple processes perform the read and write operations on the same file: Multiple processes cooperate with each other to avoid unnecessary operations. In this mode, the MPI parallel I/O performance is expected to be optimal.

Parallel NetCDF is a library that implements high-performance I/O using MPI-IO and a custom version of NETCDF API. If the application supports the Parallel NetCDF library, you can enable the Parallel NetCDF library to improve the I/O performance. Meteorological, marine, and environmental applications, such as the popular WRF meteorological model application, may use the Parallel NetCDF library.

The HDF5 library can also enable the parallel mode during compilation, which improves the performance of applications that support the HDF5 parallel interface.

Procedure

  1. Run the following commands to install PNETCDF.

    tar -xvf parallel-netcdf-1.9.0.tar.bz2

    cd parallel-netcdf-1.9.0

    mkdir -p /path/to/PNETCDF

    ./configure --prefix=/path/to/PNETCDF --build=aarch64-unknown-linux-gnu CFLAGS="-fPIC -DPIC" CXXFLAGS="-fPIC -DPIC" FCFLAGS="-fPIC" FFLAGS="-fPIC"

    make -j 16

    make install

  2. Run the following commands to set PNETCDF environment variables.

    export PATH=/path/to/PNETCDF/bin:$PATH

    export LD_LIBRARY_PATH=/path/to/PNETCDF/lib:$LD_LIBRARY_PATH

  3. When compiling the application software, set CPPFLAGS and LDFLAGS to link PNETCDF to the application software.

    export PNETCDF=/path/to/PNETCDF

    export CPPFLAGS="-I$PNETCDF/include"

    export LDFLAGS="-L$PNETCDF/lib -lpnetcdf"