Matricization Check

The tool checks matricizable code snippets and provides modification suggestions.

Introduction

The matricization check tool checks and optimizes code snippets that incorporate the Stencil, General matrix-vector multiplication (GEMV), or Fast Fourier Transform (FFT) technology. The tool can check and optimize C, C++, and Fortran source code. The check process is performed on the abstract syntax tree (AST). The C and C++ source code uses Clang to generate the AST, and the Fortran source code uses Fparser to generate the AST. The tuning process is closely related to each computing mode.

Stencil computation is an important kind of computation widely used in scientific applications, such as partial differential equations, the Gauss–Seidel method, computational fluid dynamics, and earth system simulation. It iteratively updates the values of the spatial grid points over multiple time steps according to a given pattern. The fixed pattern in which each point in the spatial grid is updated based on a subset of its neighbors is called Stencil.
GEMV is a common linear algebra operation that can be highly optimized to take advantage of the parallelism and vectorization instructions of modern computer architectures. In computer science, GEMV is usually used as part of matrix multiplication, that is, multiplying a matrix with a vector.
FFT is an efficient and fast method for calculating the discrete Fourier transform (DFT). This method features a high calculation efficiency since it can complete the calculation within the time complexity of O(nlogn), where n is the length of the sequence. At the same time, it has good flexibility for it supports different decomposition methods and calculation algorithms.

C, C++, and Fortran support the 12 tuning technologies. Fortran supports all of them, whereas C and C++ support equivalent transformation, precision-consistent conversion of division to multiplication, and communication hiding.

Equivalent transformation: Vectorization is enabled by converting power expansion to multiplication.
Elimination of redundant common operators: Common subsequences are extracted and are stored in temporary arrays. Extracting common subsequences across blocks eliminates redundant calculations.
Unit step calculation: The sign function in the judgment and assignment statements in a loop is converted to a step function (max/min/merge) call, thus enabling vectorization.
Precision-consistent conversion of division to multiplication: The reciprocal calculation is hoisted out of the loop to convert the division calculation into the multiplication calculation of the same precision.
Search algorithm optimization: The code of implementing searches is identified and replaced with the code of the binary search algorithm to improve search performance.
Large data dimension reduction: n-dimensional arrays are defined in the code, but only m (m < n)-dimensional arrays are used. In this case, memory access can be optimized by rebuilding the arrays as m-dimensional arrays.
Communication hiding: Some code snippets are irrelevant to communication variables before and after a blocking communication function is called. Those irrelevant code snippets are identified and moved to the end of the function and the blocking communication function is changed to a non-blocking communication function, aiming to improve code parallelism.
Parallelization of reduction calculation: When reduction calculation exists in a loop, the loop is expanded to reduce the dependency of variables on themselves and increase the degree of parallelism.
Directive statement optimization: Directive statements are used to implement vectorization and prefetch optimization for the compiler.
Sin/Cos operator fusion: Sin/Cos calculations are combined to reduce function calls and accelerate performance.
Exp calculation simplification: The multiplication calculation of multiple exp functions is replaced with the addition calculation within a single exp function. This replacement reduces exp function calls to lessen calculation workload and accelerate performance.
Loop fusion: Adjacent loops are merged to reduce the loop overhead, improve data locality and accelerating performance.

Prerequisites

You have logged in to the Kunpeng DevKit.

/opt is the default installation directory of the tool. The following uses this directory as an example. Replace it with the actual directory.
On the WebUI, this feature requires uploading the files or compressed package. In the IDE, the tool plugin can scan local projects. If the source code is included in a compressed package, decompress the package and select the decompressed folder.

Procedure

On the left pane of the page, choose Affinity Analyzer > Matricization Check and click

to create a task. See Figure 1.

Figure 1 Matricization check

**Table 1** Matricization check parameters
Parameter	Description
Task Name	A task name is automatically generated by default, which can be modified as required.
Source File Path	Set this parameter using either of the following methods: If you want to use uploaded source code, click the text box and select a source code path from the drop-down list or manually enter a source code path. Click Upload on the right to upload the package or folder. (The package is automatically decompressed during the upload.) NOTE: Only TAR, TAR.BZ, TAR.BZ2, TAR.GZ, TAR.XZ, TBZ, TBZ2, TGZ, TXZ and ZIP packages can be uploaded. Only one package can be uploaded at a time. The source package cannot exceed 1 GB, and the extracted files must be less than or equal to half of the remaining drive space. Only one folder can be uploaded at a time. The size of the folder must be less than or equal to half of the remaining drive space. Before manually uploading a software package, check whether the target directory exists. If it does not exist, create such a directory and grant the read, write, and execute permissions to the devkit user.
File or Folder to Scan	Enter the file or folder to be scanned. It is the relative path of the source file directory.
Optimization Method	The options are: SME matricization Stencil GEMV FFT Domain optimization Computing optimization: equivalent transformation, elimination of redundant common operators, unit step calculation optimization, precision-consistent conversion of division to multiplication, search algorithm optimization, reduction calculation parallelization, directive statement optimization, sin/cos operator fusion, exp calculation simplification, and loop fusion. Memory access optimization: large data dimension reduction. Communication optimization: communication hiding.
Compiler Options	Select a compilation method. The options are: Fill in the compile command Upload the compile_commands.json file. For details about how to upload the JSON file, see Generating a JSON File.
Build Tool	Select a build tool. The options are: Make CMake

Click Check. After the check is complete, the check report page is displayed. See Figure 2. Click the Task Information tab page to view the task details.
Figure 2 Matricization check report

If the check result suggests that there are source files need to be modified, click View Suggested Source Code in the Operation column. See Figure 3.
Figure 3 Source code modification suggestion
- The tool supports concurrent running of multiple matricization check tasks.
- To cancel a task, click Close during the task running process.
- You can click the arrow keys in the upper right corner of Original Source Code to view the code.
- If the check fails or the check result indicates that no modification is required, an empty report is generated.

Parent topic: Affinity Analysis