?gemm

Multiply one matrix by another, that is, $\text{[math]}$ .

The value of op(X) may be $\text{[math]}$ . alpha and beta are multiplication coefficients; op(A) is an m x k matrix; op(B) is a k x n matrix, and C is an m x n matrix.

Interface Definition

C interface:

void cblas_sgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const float alpha, const float *A, const BLASINT lda, const float *B, const BLASINT ldb, const float beta, float *C, const BLASINT ldc);

void cblas_dgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const double alpha, const double *A, const BLASINT lda, const double *B, const BLASINT ldb, const double beta, double *C, const BLASINT ldc);

void cblas_cgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const void *alpha, const void *A, const BLASINT lda, const void *B, const BLASINT ldb, const void *beta, void *C, const BLASINT ldc);

void cblas_zgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const void *alpha, const void *A, const BLASINT lda, const void *B, const BLASINT ldb, const void *beta, void *C, const BLASINT ldc);

Fortran interface:

CALL SGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)

CALL DGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)

CALL CGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)

CALL ZGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)

Parameters

Parameter	Type	Description	Input/Output
order	Enumeration type CBLAS_ORDER	Indicates whether the matrix is in row- or column-major order.	Input
TransA	Enumeration type CBLAS_TRANSPOSE	Indicates whether the matrix A is a conventional matrix, a transpose matrix, or a conjugate matrix. If TransA = CblasNoTrans, then . If TransA = CblasTrans, then . If TransA = CblasConjTrans, then .	Input
TransB	Enumeration type CBLAS_TRANSPOSE	The matrix B is a conventional matrix, a transpose matrix, or a conjugate matrix. If TransB = CblasNoTrans, then . If TransB = CblasTrans, then . If TransB = CblasConjTrans, then .	Input
M	Integer	Rows of matrices op(A) and C	Input
N	Integer	Columns of matrices op(B) and C	Input
K	Integer	Columns of the matrix op(A) and rows of the matrix op(B)	Input
alpha	For sgemm, alpha is of single-precision floating-point type. For dgemm, alpha is of double-precision floating-point type. For cgemm, alpha is of single-precision complex number type. For zgemm, alpha is of double-precision complex number type.	Multiplication coefficient	Input
A	For sgemm, A is of single-precision floating-point type. For dgemm, A is of double-precision floating-point type. For cgemm, A is of single-precision complex number type. For zgemm, A is of double-precision complex number type.	Matrix A	Input
lda	Integer	If the matrix is column store and TransA = CblasNoTrans, lda is at least max(1, m); otherwise, lda is at least max(1, k). If A is a row-store matrix and TransA = CblasNoTrans, lda is at least max(1, k); otherwise, lda is at least max(1, m).	Input
B	For sgemm, B is of single-precision floating-point type. For dgemm, B is of double-precision floating-point type. For cgemm, B is of single-precision complex number type. For zgemm, B is of double-precision complex number type.	Matrix B	Input
ldb	Integer	If the matrix is column store and TransB = CblasNoTrans, ldb is at least max(1, k); otherwise, ldb is at least max(1, n). If the matrix is row store and TransB = CblasNoTrans, ldb is at least max(1, n); otherwise, ldb is at least max(1, k).	Input
beta	For sgemm, beta is of single-precision floating-point type. For dgemm, beta is of double-precision floating-point type. For cgemm, beta is of the single-precision complex number type. For zgemm, beta is of double-precision complex number type.	Multiplication coefficient	Input
C	For sgemm, C is of single-precision floating-point type. For dgemm, C is of double-precision floating-point type. For cgemm, C is of single-precision complex number type. For zgemm, C is of double-precision complex number type.	Matrix C	Input/Output
ldc	Integer	If the matrix is column store, ldc must be at least max(1, m). Otherwise, ldc must be at least max(1, n).	Input

Customizing Thread Configurations

When using the dgemm and zgemm interfaces, you can use environment variables BLAS_MNK_RANGE and BLAS_MNK_THREADS to customize thread configurations.

BLAS_MNK_RANGE indicates the matrix scale threshold. The value range is (0, 1e18).

BLAS_MNK_THREADS indicates the number of threads. When the input matrix scale (M x N x K) is no less than the value of BLAS_MNK_RANGE, the dgemm/zgemm interface is executed based on the specified number of threads. The value of BLAS_MNK_THREADS cannot be greater than that of OMP_NUM_THREADS. It is recommended that the value of OMP_NUM_THREADS be no more than the number of CPU cores.

If BLAS_MNK_RANGE is not set or is set to 0, the system automatically allocates a certain number of threads based on the current environment.

Dependencies

#include "kblas.h"

Examples

C interface:

    int m = 4, k = 3, n = 4, lda = 4, ldb = 3, ldc = 4; 
    float alpha = 1.0, beta = 2.0; 
     /* 
     * A: 
     *     0.340188,       0.411647,       -0.222225, 
     *     -0.105617,      -0.302449,      0.053970, 
     *     0.283099,       -0.164777,      -0.022603, 
     *     0.298440,       0.268230,       0.128871, 
     * B: 
     *     -0.135216,      0.416195,       -0.358397,      -0.257113, 
     *     0.013401,       0.135712,       0.106969,       -0.362768, 
     *     0.452230,       0.217297,       -0.483699,      0.304177, 
     * C: 
     *     -0.343321,      0.498924,       0.112640,       -0.006417, 
     *     -0.099056,      -0.281743,      -0.203968,      0.472775, 
     *     -0.370210,      0.012932,       0.137552,       -0.207483, 
     *     -0.391191,      0.339112,       0.024287,       0.271358, 
     */ 
    float a[12] = {0.340188, -0.105617, 0.283099, 
                    0.298440, 0.411647, -0.302449, 
                    -0.164777, 0.268230, -0.222225, 
                    0.053970, -0.022603, 0.128871}; 
    float b[12] = {-0.135216, 0.013401, 0.452230, 0.416195, 
                    0.135712, 0.217297, -0.358397, 0.106969, 
                    -0.483699, -0.257113, -0.362768, 0.304177}; 
    float c[16] = {-0.343321, -0.099056, -0.370210, -0.391191, 
                    0.498924, -0.281743, 0.012932, 0.339112, 
                    0.112640, -0.203968, 0.137552, 0.024287, 
                    -0.006417, 0.472775, -0.207483, 0.271358}; 
 
    cblas_sgemm(CblasColMajor,CblasNoTrans,CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc); 
    /* 
     * Output C: 
     *     -0.827621       1.147010        0.254881        -0.317229 
     *     -0.163476       -0.636762       -0.428542       1.098841 
     *     -0.791128       0.116416        0.166949        -0.434854 
     *     -0.760862       0.866839        -0.092028       0.407877 
     * 
     */

Fortran interface:

      INTEGER :: M=4, K=3, N=4 
      INTEGER :: LDA=4, LDB=3, LDC=4 
      REAL(4) :: ALPHA=1.0, BETA=2.0 
      REAL(4) :: A(12), B(12), C(16) 
      DATA A/0.340188, -0.105617, 0.283099, 
     $       0.298440, 0.411647, -0.302449, 
     $       -0.164777, 0.268230, -0.222225, 
     $       0.053970, -0.022603, 0.128871/ 
      DATA B/-0.135216, 0.013401, 0.452230, 0.416195, 
     $       0.135712, 0.217297, -0.358397, 0.106969, 
     $       -0.483699, -0.257113, -0.362768, 0.304177/ 
      DATA C/-0.343321, -0.099056, -0.370210, -0.391191, 
     $       0.498924, -0.281743, 0.012932, 0.339112, 
     $       0.112640, -0.203968, 0.137552, 0.024287, 
     $       -0.006417, 0.472775, -0.207483, 0.271358/ 
      EXTERNAL SGEMM 
      CALL SGEMM('N', 'N', M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, 
     $          LDC) 
*     Output C: 
*         -0.827621       1.147010        0.254881        -0.317229 
*         -0.163476       -0.636762       -0.428542       1.098841 
*         -0.791128       0.116416        0.166949        -0.434854 
*         -0.760862       0.866839        -0.092028       0.407877

Parent topic: KML_BLAS Level 3 Functions