?gemm
Multiply one matrix by another, that is,
.
The value of op(X) may be
. alpha and beta are multiplication coefficients; op(A) is an m x k matrix; op(B) is a k x n matrix, and C is an m x n matrix.
Interface Definition
C interface:
void cblas_sgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const float alpha, const float *A, const BLASINT lda, const float *B, const BLASINT ldb, const float beta, float *C, const BLASINT ldc);
void cblas_dgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const double alpha, const double *A, const BLASINT lda, const double *B, const BLASINT ldb, const double beta, double *C, const BLASINT ldc);
void cblas_cgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const void *alpha, const void *A, const BLASINT lda, const void *B, const BLASINT ldb, const void *beta, void *C, const BLASINT ldc);
void cblas_zgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const void *alpha, const void *A, const BLASINT lda, const void *B, const BLASINT ldb, const void *beta, void *C, const BLASINT ldc);
Fortran interface:
CALL SGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
CALL DGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
CALL CGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
CALL ZGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
Parameters
Parameter |
Type |
Description |
Input/Output |
|---|---|---|---|
order |
Enumeration type CBLAS_ORDER |
Indicates whether the matrix is in row- or column-major order. |
Input |
TransA |
Enumeration type CBLAS_TRANSPOSE |
Indicates whether the matrix A is a conventional matrix, a transpose matrix, or a conjugate matrix.
|
Input |
TransB |
Enumeration type CBLAS_TRANSPOSE |
The matrix B is a conventional matrix, a transpose matrix, or a conjugate matrix.
|
Input |
M |
Integer |
Rows of matrices op(A) and C |
Input |
N |
Integer |
Columns of matrices op(B) and C |
Input |
K |
Integer |
Columns of the matrix op(A) and rows of the matrix op(B) |
Input |
alpha |
|
Multiplication coefficient |
Input |
A |
|
Matrix A |
Input |
lda |
Integer |
|
Input |
B |
|
Matrix B |
Input |
ldb |
Integer |
|
Input |
beta |
|
Multiplication coefficient |
Input |
C |
|
Matrix C |
Input/Output |
ldc |
Integer |
If the matrix is column store, ldc must be at least max(1, m). Otherwise, ldc must be at least max(1, n). |
Input |
Customizing Thread Configurations
When using the dgemm and zgemm interfaces, you can use environment variables BLAS_MNK_RANGE and BLAS_MNK_THREADS to customize thread configurations.
BLAS_MNK_RANGE indicates the matrix scale threshold. The value range is (0, 1e18).
BLAS_MNK_THREADS indicates the number of threads. When the input matrix scale (M x N x K) is no less than the value of BLAS_MNK_RANGE, the dgemm/zgemm interface is executed based on the specified number of threads. The value of BLAS_MNK_THREADS cannot be greater than that of OMP_NUM_THREADS. It is recommended that the value of OMP_NUM_THREADS be no more than the number of CPU cores.
If BLAS_MNK_RANGE is not set or is set to 0, the system automatically allocates a certain number of threads based on the current environment.
Dependencies
#include "kblas.h"
Examples
C interface:
int m = 4, k = 3, n = 4, lda = 4, ldb = 3, ldc = 4;
float alpha = 1.0, beta = 2.0;
/*
* A:
* 0.340188, 0.411647, -0.222225,
* -0.105617, -0.302449, 0.053970,
* 0.283099, -0.164777, -0.022603,
* 0.298440, 0.268230, 0.128871,
* B:
* -0.135216, 0.416195, -0.358397, -0.257113,
* 0.013401, 0.135712, 0.106969, -0.362768,
* 0.452230, 0.217297, -0.483699, 0.304177,
* C:
* -0.343321, 0.498924, 0.112640, -0.006417,
* -0.099056, -0.281743, -0.203968, 0.472775,
* -0.370210, 0.012932, 0.137552, -0.207483,
* -0.391191, 0.339112, 0.024287, 0.271358,
*/
float a[12] = {0.340188, -0.105617, 0.283099,
0.298440, 0.411647, -0.302449,
-0.164777, 0.268230, -0.222225,
0.053970, -0.022603, 0.128871};
float b[12] = {-0.135216, 0.013401, 0.452230, 0.416195,
0.135712, 0.217297, -0.358397, 0.106969,
-0.483699, -0.257113, -0.362768, 0.304177};
float c[16] = {-0.343321, -0.099056, -0.370210, -0.391191,
0.498924, -0.281743, 0.012932, 0.339112,
0.112640, -0.203968, 0.137552, 0.024287,
-0.006417, 0.472775, -0.207483, 0.271358};
cblas_sgemm(CblasColMajor,CblasNoTrans,CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc);
/*
* Output C:
* -0.827621 1.147010 0.254881 -0.317229
* -0.163476 -0.636762 -0.428542 1.098841
* -0.791128 0.116416 0.166949 -0.434854
* -0.760862 0.866839 -0.092028 0.407877
*
*/
Fortran interface:
INTEGER :: M=4, K=3, N=4
INTEGER :: LDA=4, LDB=3, LDC=4
REAL(4) :: ALPHA=1.0, BETA=2.0
REAL(4) :: A(12), B(12), C(16)
DATA A/0.340188, -0.105617, 0.283099,
$ 0.298440, 0.411647, -0.302449,
$ -0.164777, 0.268230, -0.222225,
$ 0.053970, -0.022603, 0.128871/
DATA B/-0.135216, 0.013401, 0.452230, 0.416195,
$ 0.135712, 0.217297, -0.358397, 0.106969,
$ -0.483699, -0.257113, -0.362768, 0.304177/
DATA C/-0.343321, -0.099056, -0.370210, -0.391191,
$ 0.498924, -0.281743, 0.012932, 0.339112,
$ 0.112640, -0.203968, 0.137552, 0.024287,
$ -0.006417, 0.472775, -0.207483, 0.271358/
EXTERNAL SGEMM
CALL SGEMM('N', 'N', M, N, K, ALPHA, A, LDA, B, LDB, BETA, C,
$ LDC)
* Output C:
* -0.827621 1.147010 0.254881 -0.317229
* -0.163476 -0.636762 -0.428542 1.098841
* -0.791128 0.116416 0.166949 -0.434854
* -0.760862 0.866839 -0.092028 0.407877





