Cases

CP2K Matrix Parameter Optimization

CP2K is a quantum chemistry and solid-state physics package used to perform atomic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems. The core of the CP2K algorithm is Distributed Block Compressed Sparse Row (DBCSR), which is the only module that uses GPU acceleration. Figure 1 shows the calculation framework.

Figure 1 CP2K calculation framework

In the calculation process, nearly half of the time is consumed by the GPU. The algorithm divides the large matrix into 13 block matrices for calculation. Each block matrix has seven adjustable kernel parameters. The kernel parameters of 12 block matrices are predicted using a fixed algorithm, which still have a large tuning space on the A100 architecture. After automatic parameter tuning, the performance is improved by 10%.

GROMACS Core Binding Optimization

GROMACS is a computational engine for Molecular Dynamics (MD) simulation and energy minimization, which simulates hundreds to millions of atomic systems using Newton's equilibrium equations. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a large number of complicated bonded interactions, but since it is extremely fast at calculating the nonbonded interactions (that usually dominate simulations), many groups are also using it for research on non-biological systems, such as polymers. Compared with other MD simulation software, GROMACS has unique advantages.

GROMACS OpenMP threads and MPI processes should be correctly pinned to the system cores/threads. This operation can be performed using the MPI launcher/batch processing system or GROMACS, improving the performance by 10%.

Allocating bonded force (BF) computations to the GPU has a great impact on performance. When the GPU is underutilized, allocating BF computations to the GPU yields better performance; when the GPU is saturated, offloading BF computations to the CPU is better. This optimization can improve the E2E performance by 6%.

Parent topic: Kernel Code Optimization