我要评分
获取效率
正确性
完整性
易理解

Parallel I/O

Principles

Parallel I/O enables multiple processes to perform I/O operations at the same time. It accelerates the program running of HPC applications where there are a large number of I/O operations. There are three parallel I/O modes:

  • Only one process is involved in reading and writing files

    A process (P0) reads all the data in the file into its buffer, and then uses the MPI transmit/receive function to transfer most of the data to other processes. After the computation, other processes send the results to P0 and P0 writes all data results to a file. The process reading and writing files is the performance bottleneck. The read and write bandwidth is limited by the network bandwidth of the computing server where P0 resides and the performance upper limit of a single process in the storage system.

  • Multiple processes involved in reading and writing different files

    Each process operates only its own files and is independent of each other. In this mode, multiple network channels of a computing server can be used at the same time, and the multi-client access capability of a parallel storage system can be leveraged. The disadvantage is that the source data files to be read may be fewer than processes. As a result, the load is unbalanced, the output file data is too much, and subsequent processing is difficult.

  • Multiple processes involved in reading and writing the same file

    Multiple processes cooperate with each other to avoid unnecessary operations. In this mode, each process needs to compute the file offset pointer to avoid data conflicts. In this mode, the parallel I/O performance may be maximized.

Modification Method

There are four types of parallel I/O interfaces: POSIX I/O, MPI I/O, HDF5 I/O and NetCDF-4 I/O (parallel-netcdf). You need to modify the interfaces based on the parallel I/O supported by the HPC application. parallel-netcdf is used as an example. It is a library that uses MPI-IO and a customized netCDF API to implement high-performance I/O. It is often used in the meteorology, ocean, environment, and other fields.

Perform the following steps:

  1. Download and install parallel-netcdf. For details, visit https://github.com/Parallel-NetCDF/PnetCDF.
  2. Configure the PnetCDF environment variables.
    1
    2
    export PATH=/path/to/PNETCDF/bin:$PATH
    export LD_LIBRARY_PATH=/path/to/PNETCDF/lib:$LD_LIBRARY_PATH
    
  3. When compiling your application, set CPPFLAGS and LDFLAGS to link PnetCDF to the application.
    1
    2
    3
    export PNETCDF=/path/to/PNETCDF
    export CPPFLAGS="-I$PNETCDF/include"
    export LDFLAGS="-L$PNETCDF/lib -lpnetcdf"