API Reference
KDNN Operators
Operator Range
Table 1 Operators available in KDNN lists the operators available in KDNN.
Table 1 Operators available in KDNN
Activation operator (Leaky ReLU) with a trainable alpha parameter |
|
Operator Description
Eltwise
Function Description
Performs operations of the same type on each element in a tensor, including abs, exp, and log.

Table 1 Operation types describes the operations supported by the Eltwise operator.
|
||
Table 2 Formula parameters describes the meanings of the symbols in the preceding formulas.
Input parameters α and β of the constant floating-point type |
Feature Scope
Propagation Directions and Data Types
The Eltwise operator in KDNN supports 1D to 5D and sequential data layout.
Table 1 Mapping between each tensor dimension and parameter data layout
Layer Normalization
Function Description
Performs layer normalization.
The formula of the layer normalization operator in the case of three dimensions is as follows:

The mean and variance can be calculated at run time or provided by the user. To calculate the two values at run time, use the following formulas:


Table 1 Formula parameters describes the parameters in the formulas.
Feature Scope
Propagation Directions and Flags
|
|
|
|
|
Table 2 Mapping between each tensor dimension and parameter data layout
Inner Product
Function Description
Calculates the matrix inner product.
In a 2D case, the formula for calculating the matrix inner product is as follows:

High-dimensional tensors are flattened into 2D tensors for calculation.
| Number of | |
Feature Scope
Table 1 Parameter data types of the forward direction
| Data type of the | |||
|---|---|---|---|
Table 2 Parameter data types of the backward direction (dnnl_backward_data category)
Table 3 Parameter data types of the backward direction (dnnl_backward_weights category)
Softmax
Function Description
Performs the Softmax function operation along a data dimension.

Feature Scope
Table 2 Mapping between each tensor dimension and parameter data layout
Sum
Function Description
Calculates the sum of N tensors.

Feature Scope
The KDNN Sum operator supports the following data layout:
- The data dimension can be 1D to 5D.
- Each of the N input tensors must have the same dimensions and data layout as the output tensor.
Table 2 Mapping between each tensor dimension and parameter data layout
Matmul
Function Description
Performs matrix multiplication.
2D Tensor

High-dimensional tensor

Height and width dimensions
m,n, andkfor input matrices A(m,k) and B(k,n), and the output matrix C(m,n).
Feature Scope
Table 2 Mapping between each tensor dimension and parameter data layout
Convolution
Function Description
Performs convolution.
General 2D convolution calculation formula:

Feature Scope
Propagation Directions and Data Types
Table 1 Parameter data types of the forward direction
Table 2 Parameter data types of the backward direction (dnnl_backward_data category)
Table 3 Parameter data types of the backward direction (dnnl_backward_weights category)
2D convolution is supported. The input and output tensor dimension is 4D. The layout of src, weights, and dst data needs to meet the following requirements:
Deconvolution
Function Description
Performs deconvolution. Both forward and backward directions are supported.
Feature Scope
Table 1 Parameter data types of the forward direction
Table 2 Parameter data types of the backward direction (dnnl_backward_data category)
Table 3 Parameter data types of the backward direction (dnnl_backward_weights category)
2D transposed convolution is supported. The input and output data are 4D tensors. The data layout of src, weights, and dst must meet the following requirements:
Concat
Function Description
Concatenates N tensors over the specified concat_dimension dimension (represented by C).

Wherein: 
The Concat primitive does not distinguish between forward and backward propagation.
Feature Scope
Table 1 Supported data type combinations (input and output data types being the same)
A maximum of 5 dimensions are supported. Input tensors must have the same number of dimensions, and the size of each dimension must be identical. The following data layout formats are supported, and the input and output tensors must use the same layout.
abcd, abdc, acbd, acdb, adbc, adcb, bacd, bcda, cdab, cdba, dcab |
|
abcde, abced, abdec, acbde, acdeb, adecb, bacde, bcdea, cdeab, cdeba, decab |
N1xN11xN3xN4xN5: N1xN12xN3xN4xN5 It is required that the lengths of other dimensions be the same except that of the concatenation dimension. |
Concat requires that the memory layout of input and output tensors be the same and the corresponding data types be the same.
Resampling
Function Description
Performs resampling operations on the input tensor. This operator uses two interpolation algorithms: Nearest Neighbor and Linear.
The nearest neighbor interpolation algorithm is dst(n, c, oh, ow) = src(n, c, ih, iw), where:
- ih=[(oh+0.5)/Fh−0.5]
- iw=[(ow+0.5)/Fw−0.5]
The mathematical formula for bilinear sampling is dst(n, c, oh, ow) = src(n, c, ih0, iw0)*(1 - Wih)*(1 - Wiw) +src(n, c, ih1, iw0)*Wih*(1 - Wiw) + src(n, c, ih0, iw1) * (1 - Wih)*Wiw +src(n, c, ih1, iw1)*Wih*Wiw, where:
- ih0=⌊oh+0.5Fh−0.5⌋
- ih1=⌈oh+0.5Fh−0.5⌉
- iw0=⌊ow+0.5Fw−0.5⌋
- iw1=⌈ow+0.5Fw−0.5⌉
- Wih=oh+0.5Fh−0.5−ih0
- Wiw=ow+0.5Fw−0.5−iw0
Feature Scope
FWD_D and BWD_D support arbitrary combination of the f32, f16, and bf16 data types.
Three to five dimensions are supported. The following data layout formats are supported, and the input and output tensors must use the same layout.
Resampling requires that the memory layout of the input and output tensors be the same, but the dimensions can be different.
For details, see the following cases:
- 5D: mb4_ic8_id4od8_ih4oh8_iw4ow8, 4×8×4×4×4 (input), 4×8×8×8×8 (output)
- 4D: mb4_ic8_ih4oh8_iw4ow8, 4x8x4x4 (input), output 4x8x8x8 (output)
- 3D: mb4_ic8_iw4ow8, 4x8x4 (input), 4x8x8 (output)
Note that if the value of id, ih, iw, od, oh, or ow is too large, the precision may be affected due to a limitation of the test system. This is not a functional problem because other platforms have the same issue. To avoid this issue, you are advised to set the preceding parameters to a value less than 20000.
Shuffle
Function Description
Shuffles tensor data along a shuffle axis (dimension).
The formula is
, where c' and c relate through the equations
and
. In the formula,
.
Feature Scope
Supported data types are as follows. The src and dst data types must be the same.
One to five dimensions are supported. The following data layout formats are supported, and the input and output tensors must use the same layout.
Integer that is greater than or equal to 1 and can be exactly divided by the dimension of the axis |
||
Reorder
Function Description
Reorders tensors into arbitrary memory layout formats and data types.
The formula is as follows:

Feature Scope
Table 1 Supported parameter data types
One to five dimensions are supported. The following data layout formats are supported.
Pool
Function Description
Implements pooling operations (maximum and average) to reduce tensor dimensions while preserving key features.
Feature Scope
One to five dimensions are supported. The following data layout formats are supported.
Batch Normalization (bnormal)
Function Description
Performs batch normalization on tensors.
The formula is
.
For details, see parameter description in Layer Normalization.
Feature Scope
Three to five dimensions are supported. The following data layout formats are supported. The input and output tensors must use the same layout.
Local Response Normalization (lrn)
Function Description
Performs local response normalization.
The cross-channel formula is as follows: 
The single-channel formula is as follows:

Feature Scope
Table 1 Supported parameter data types
One to five dimensions are supported. The following data layout formats are supported.
Reduction
Function Description
Performs a specified algorithm operation on each target element in one or more dimensions of a tensor.
The formula is
, where reduce_op includes the operations listed in Table 1 reduce_op algorithm operations.
Table 1 reduce_op algorithm operations
Feature Scope
Table 1 Supported parameter data types
One to five dimensions are supported. The following data layout formats are supported.
The dimension of the dst tensor being reduced must be 1. Examples:
- A 5D src tensor (5×6×7×8×9) paired with a 5D dst tensor (1×1×1×1×1) indicates reduction across all dimensions (dimensions 1 through 5).
- A 5D src tensor (5×6×7×8×9) paired with a 5D dst tensor (5×6×7×8×1) indicates reduction only along the innermost dimension (dimension 5).
PReLU
Function Description
An improved version of the Rectified Linear Unit (ReLU) activation function, performs parameterized ReLU operation.
The forward formula is as follows: 
The backward formula is as follows: 
Feature Scope
Table 1 Supported parameter data types
One to five dimensions are supported. The following data layout formats are supported.
Binary
Function Description
Returns the element-wise operation results between tensors source0 and source1, with support for reordering to arbitrary layouts and conversion to arbitrary data types.
The formula is
.
The binary operator does not distinguish between forward and backward propagation.
Feature Scope
Table 1 Supported parameter data types
One to five dimensions are supported. The following data layout formats are supported, and the input and output tensors must use the same layout.
RNN
Function Description
Processes sequential or time-series data to train machine learning models that can generate sequential predictions or derive conclusions from sequence-based inputs.
The RNN formula is as follows:

Feature Scope
Table 1 Supported parameter data types
The data layout is fixed.
Group Normalization
Function Description
Performs group normalization by channel.
The formula is as follows:

where the shape of the input src is defined by (N, C, H, W), and G indicates the number of groups.


: scaling and shift coefficients
: mean and variance
Feature Scope
Table 1 Supported parameter data types
The data types of mean, variance, scale, and shift are independent of src and dst, and remains f32.
Three to five dimensions are supported. The following data layout formats are supported. The input and output tensors must use the same layout.
SparseGemm
Function Description
Computes the product of a sparse matrix and a dense matrix. The operator is designed based on the compressed sparse row (CSR) storage structure. It skips zero blocks during loading and computing to maximize the efficiency of computational and memory bandwidth utilization. The core computing kernel is optimized for the Kunpeng platform based on SIMD.
The SparseGemm operator computes the following matrix multiplication:

where,
- A: sparse matrix
- B: dense matrix
- C: output matrix
- α and β: optional scaling coefficients
Feature Scope
Table 1 Supported parameter data types
KDNN_EXT Operators
Operator Description
KDNN_EXT is an extension module of KDNN. It has the following features:
- Easy-to-use interfaces: The Cython framework is used to provide Python interfaces, making it more suitable for user scenarios.
- High performance: The bottom-layer implementation is in the C language, providing high-performance interfaces.
The following operators are available:
- random_choice
- softmax
Operator Definition
softmax
Softmax is a common activation function used in multi-classification problems. It converts a set of arbitrary real numbers into a probability distribution whose output values range from 0 to 1, and the sum of all output values is 1.
The main features are as follows:
- Normalized output: The softmax function normalizes the input to ensure that the output is a valid probability distribution. Even if the input is any real number, the output sum of the softmax function is still 1. It is commonly used at the output layer of multi-classification problems.
- Non-linear: The softmax function is a non-linear function. It can perform non-linear transformation on the input to increase the representation capability of the model, thereby better fitting complex data patterns.
- Translation invariance: The softmax function performs translation invariance on the input. That is, when each element in the input vector adds (or subtracts) the same constant, the softmax output is not changed.
In a neural network, the softmax function is usually used at the output layer to convert the original output of the neural network into a vector representing class probabilities. During training, the difference between the softmax output and the actual label can be used as a loss function. Through backward propagation, network parameters are updated to minimize the loss and improve model performance.
def softmax(arr: np.ndarray)->np.ndarray
Receives a 1D or 2D NumPy array and returns the result of softmax calculation.
The elements are real numbers in FP32, and the dimension can be 1D or 2D. |
>>> import numpy as np
>>> from libkdnn_ext import softmax
>>> x = np.random.rand(1, 5).astype(np.float32)
>>> softmax(x)
array([[0.19810137, 0.21171768, 0.16419397, 0.24222486, 0.1837621 ]], dtype=float32)random_choice
random_choice is an algorithm used to randomly select elements from a set by probability. In computer science, random selection is a common operation used in scenarios such as random sampling, random arrangement, and Monte Carlo simulation.
The core of the random selection algorithm is to randomly select an element from a given set. For an input whose sum is 1, it randomly selects an element by probability and returns the index of the element.
def random_choice(arr: np.ndarray, seed: int)->List[int]
Receives NumPy arrays and random seeds, and returns the result of random_choice calculation.
>>> import numpy as np
>>> from libkdnn_ext import random_choice
>>> a = np.random.rand(1, 70336).astype(np.float32)
>>> a = np.abs(a)
>>> t = a.sum(axis=1)
>>> a = a / t
>>> random_choice(a, -1)
array([17630], dtype=int32)
>>> random_choice(a, 2)
array([49333], dtype=int32)Obtaining Version Information
Obtains the KDNN_EXT product version information.
def get_version() -> Dict[bytes, bytes]
The product version information is returned. If an exception occurs, |
>>> from libkdnn_ext import get_version
>>> get_version()
{'productName': b'Kunpeng Boostkit', 'productVersion': b'24.0.0', 'componentName': b'BoostKit-kail', 'componentVersion': b'1.0.0', 'componentAppendInfo': b'gcc', 'softwareName': b'boostKit-kail-dnn-ext', 'softwareVersion': b'1.0.0'}The version number and compile time are subject to the running results in your environment. The preceding results are for reference only.




























































































































