Rate This Document
Findability
Accuracy
Completeness
Readability

?gemm

Multiply one matrix by another, that is, .

The value of op(X) may be . alpha and beta are multiplication coefficients; op(A) is an m x k matrix; op(B) is a k x n matrix, and C is an m x n matrix.

Interface Definition

C interface:

void cblas_sgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const float alpha, const float *A, const BLASINT lda, const float *B, const BLASINT ldb, const float beta, float *C, const BLASINT ldc);

void cblas_dgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const double alpha, const double *A, const BLASINT lda, const double *B, const BLASINT ldb, const double beta, double *C, const BLASINT ldc);

void cblas_cgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const void *alpha, const void *A, const BLASINT lda, const void *B, const BLASINT ldb, const void *beta, void *C, const BLASINT ldc);

void cblas_zgemm(const enum CBLAS_ORDER Order, const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB, const BLASINT M, const BLASINT N, const BLASINT K, const void *alpha, const void *A, const BLASINT lda, const void *B, const BLASINT ldb, const void *beta, void *C, const BLASINT ldc);

Fortran interface:

CALL SGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)

CALL DGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)

CALL CGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)

CALL ZGEMM(TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)

Parameters

Parameter

Type

Description

Input/Output

order

Enumeration type CBLAS_ORDER

Indicates whether the matrix is in row- or column-major order.

Input

TransA

Enumeration type CBLAS_TRANSPOSE

Indicates whether the matrix A is a conventional matrix, a transpose matrix, or a conjugate matrix.

  • If TransA = CblasNoTrans, then .
  • If TransA = CblasTrans, then .
  • If TransA = CblasConjTrans, then .

Input

TransB

Enumeration type CBLAS_TRANSPOSE

The matrix B is a conventional matrix, a transpose matrix, or a conjugate matrix.

  • If TransB = CblasNoTrans, then .
  • If TransB = CblasTrans, then .
  • If TransB = CblasConjTrans, then .

Input

M

Integer

Rows of matrices op(A) and C

Input

N

Integer

Columns of matrices op(B) and C

Input

K

Integer

Columns of the matrix op(A) and rows of the matrix op(B)

Input

alpha

  • For sgemm, alpha is of single-precision floating-point type.
  • For dgemm, alpha is of double-precision floating-point type.
  • For cgemm, alpha is of single-precision complex number type.
  • For zgemm, alpha is of double-precision complex number type.

Multiplication coefficient

Input

A

  • For sgemm, A is of single-precision floating-point type.
  • For dgemm, A is of double-precision floating-point type.
  • For cgemm, A is of single-precision complex number type.
  • For zgemm, A is of double-precision complex number type.

Matrix A

Input

lda

Integer

  • If the matrix is column store and TransA = CblasNoTrans, lda is at least max(1, m); otherwise, lda is at least max(1, k).
  • If A is a row-store matrix and TransA = CblasNoTrans, lda is at least max(1, k); otherwise, lda is at least max(1, m).

Input

B

  • For sgemm, B is of single-precision floating-point type.
  • For dgemm, B is of double-precision floating-point type.
  • For cgemm, B is of single-precision complex number type.
  • For zgemm, B is of double-precision complex number type.

Matrix B

Input

ldb

Integer

  • If the matrix is column store and TransB = CblasNoTrans, ldb is at least max(1, k); otherwise, ldb is at least max(1, n).
  • If the matrix is row store and TransB = CblasNoTrans, ldb is at least max(1, n); otherwise, ldb is at least max(1, k).

Input

beta

  • For sgemm, beta is of single-precision floating-point type.
  • For dgemm, beta is of double-precision floating-point type.
  • For cgemm, beta is of the single-precision complex number type.
  • For zgemm, beta is of double-precision complex number type.

Multiplication coefficient

Input

C

  • For sgemm, C is of single-precision floating-point type.
  • For dgemm, C is of double-precision floating-point type.
  • For cgemm, C is of single-precision complex number type.
  • For zgemm, C is of double-precision complex number type.

Matrix C

Input/Output

ldc

Integer

If the matrix is column store, ldc must be at least max(1, m). Otherwise, ldc must be at least max(1, n).

Input

Customizing Thread Configurations

When using the dgemm and zgemm interfaces, you can use environment variables BLAS_MNK_RANGE and BLAS_MNK_THREADS to customize thread configurations.

BLAS_MNK_RANGE indicates the matrix scale threshold. The value range is (0, 1e18).

BLAS_MNK_THREADS indicates the number of threads. When the input matrix scale (M x N x K) is no less than the value of BLAS_MNK_RANGE, the dgemm/zgemm interface is executed based on the specified number of threads. The value of BLAS_MNK_THREADS cannot be greater than that of OMP_NUM_THREADS. It is recommended that the value of OMP_NUM_THREADS be no more than the number of CPU cores.

If BLAS_MNK_RANGE is not set or is set to 0, the system automatically allocates a certain number of threads based on the current environment.

Dependencies

#include "kblas.h"

Examples

C interface:

    int m = 4, k = 3, n = 4, lda = 4, ldb = 3, ldc = 4; 
    float alpha = 1.0, beta = 2.0; 
     /* 
     * A: 
     *     0.340188,       0.411647,       -0.222225, 
     *     -0.105617,      -0.302449,      0.053970, 
     *     0.283099,       -0.164777,      -0.022603, 
     *     0.298440,       0.268230,       0.128871, 
     * B: 
     *     -0.135216,      0.416195,       -0.358397,      -0.257113, 
     *     0.013401,       0.135712,       0.106969,       -0.362768, 
     *     0.452230,       0.217297,       -0.483699,      0.304177, 
     * C: 
     *     -0.343321,      0.498924,       0.112640,       -0.006417, 
     *     -0.099056,      -0.281743,      -0.203968,      0.472775, 
     *     -0.370210,      0.012932,       0.137552,       -0.207483, 
     *     -0.391191,      0.339112,       0.024287,       0.271358, 
     */ 
    float a[12] = {0.340188, -0.105617, 0.283099, 
                    0.298440, 0.411647, -0.302449, 
                    -0.164777, 0.268230, -0.222225, 
                    0.053970, -0.022603, 0.128871}; 
    float b[12] = {-0.135216, 0.013401, 0.452230, 0.416195, 
                    0.135712, 0.217297, -0.358397, 0.106969, 
                    -0.483699, -0.257113, -0.362768, 0.304177}; 
    float c[16] = {-0.343321, -0.099056, -0.370210, -0.391191, 
                    0.498924, -0.281743, 0.012932, 0.339112, 
                    0.112640, -0.203968, 0.137552, 0.024287, 
                    -0.006417, 0.472775, -0.207483, 0.271358}; 
 
    cblas_sgemm(CblasColMajor,CblasNoTrans,CblasNoTrans, m, n, k, alpha, a, lda, b, ldb, beta, c, ldc); 
    /* 
     * Output C: 
     *     -0.827621       1.147010        0.254881        -0.317229 
     *     -0.163476       -0.636762       -0.428542       1.098841 
     *     -0.791128       0.116416        0.166949        -0.434854 
     *     -0.760862       0.866839        -0.092028       0.407877 
     * 
     */

Fortran interface:

      INTEGER :: M=4, K=3, N=4 
      INTEGER :: LDA=4, LDB=3, LDC=4 
      REAL(4) :: ALPHA=1.0, BETA=2.0 
      REAL(4) :: A(12), B(12), C(16) 
      DATA A/0.340188, -0.105617, 0.283099, 
     $       0.298440, 0.411647, -0.302449, 
     $       -0.164777, 0.268230, -0.222225, 
     $       0.053970, -0.022603, 0.128871/ 
      DATA B/-0.135216, 0.013401, 0.452230, 0.416195, 
     $       0.135712, 0.217297, -0.358397, 0.106969, 
     $       -0.483699, -0.257113, -0.362768, 0.304177/ 
      DATA C/-0.343321, -0.099056, -0.370210, -0.391191, 
     $       0.498924, -0.281743, 0.012932, 0.339112, 
     $       0.112640, -0.203968, 0.137552, 0.024287, 
     $       -0.006417, 0.472775, -0.207483, 0.271358/ 
      EXTERNAL SGEMM 
      CALL SGEMM('N', 'N', M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, 
     $          LDC) 
*     Output C: 
*         -0.827621       1.147010        0.254881        -0.317229 
*         -0.163476       -0.636762       -0.428542       1.098841 
*         -0.791128       0.116416        0.166949        -0.434854 
*         -0.760862       0.866839        -0.092028       0.407877