Parallel I/O
Principles
Parallel I/O enables multiple processes to perform I/O operations at the same time. It accelerates the program running of HPC applications where there are a large number of I/O operations. There are three parallel I/O modes:
- Only one process is involved in reading and writing files
A process (P0) reads all the data in the file into its buffer, and then uses the MPI transmit/receive function to transfer most of the data to other processes. After the computation, other processes send the results to P0 and P0 writes all data results to a file. The process reading and writing files is the performance bottleneck. The read and write bandwidth is limited by the network bandwidth of the computing server where P0 resides and the performance upper limit of a single process in the storage system.

- Multiple processes involved in reading and writing different files
Each process operates only its own files and is independent of each other. In this mode, multiple network channels of a computing server can be used at the same time, and the multi-client access capability of a parallel storage system can be leveraged. The disadvantage is that the source data files to be read may be fewer than processes. As a result, the load is unbalanced, the output file data is too much, and subsequent processing is difficult.

- Multiple processes involved in reading and writing the same file
Multiple processes cooperate with each other to avoid unnecessary operations. In this mode, each process needs to compute the file offset pointer to avoid data conflicts. In this mode, the parallel I/O performance may be maximized.

Modification Method
There are four types of parallel I/O interfaces: POSIX I/O, MPI I/O, HDF5 I/O and NetCDF-4 I/O (parallel-netcdf). You need to modify the interfaces based on the parallel I/O supported by the HPC application. parallel-netcdf is used as an example. It is a library that uses MPI-IO and a customized netCDF API to implement high-performance I/O. It is often used in the meteorology, ocean, environment, and other fields.
Perform the following steps:
- Download and install parallel-netcdf. For details, visit https://github.com/Parallel-NetCDF/PnetCDF.
- Configure the PnetCDF environment variables.
1 2
export PATH=/path/to/PNETCDF/bin:$PATH export LD_LIBRARY_PATH=/path/to/PNETCDF/lib:$LD_LIBRARY_PATH
- When compiling your application, set CPPFLAGS and LDFLAGS to link PnetCDF to the application.
1 2 3
export PNETCDF=/path/to/PNETCDF export CPPFLAGS="-I$PNETCDF/include" export LDFLAGS="-L$PNETCDF/lib -lpnetcdf"