Rate This Document
Findability
Accuracy
Completeness
Readability

P?GELS

  • If TRANS = 'N' and m ≥ n: Find the least squares solution of an overdetermined system, that is, solve the least squares problem min||B-AX||.
  • If TRANS = 'N' and m < n: Find the minimum norm solution of an underdetermined system AX = B.
  • If TRANS = 'T' and m ≥ n: Find the minimum norm solution of an undetermined system ATX = B.
  • If TRANS = 'T' and m < n: Find the least squares solution of an overdetermined system, that is, solve the least squares problem min||B-AX||.

Interface Definition

C interface:

void psgels_(const char *trans, const int *m, const int *n, const int *nrhs, float *a, const int *ia, const int *ja,const int *desca, float *b, const int *ib, const int *jb, const int *descb, float *work, int *lwork, int *info)

void pdgels_(const char *trans, const int *m, const int *n, const int *nrhs, double *a, const int *ia, const int *ja,const int *desca, double *b, const int *ib, const int *jb, const int *descb, double *work, int *lwork, int *info)

void pcgels_(const char *trans, const int *m, const int *n, const int *nrhs, float _Complex *a, const int *ia, const int *ja,const int *desca, float _Complex *b, const int *ib, const int *jb, const int *descb, float _Complex*work, int *lwork, int *info)

void pzgels_(const char *trans, const int *m, const int *n, const int *nrhs, double _Complex *a, const int *ia, const int *ja,const int *desca, double _Complex*b, const int *ib, const int *jb, const int *descb, double _Complex*work, int *lwork, int *info)

Fortran interface:

PSGELS(trans, m, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, work, lwork, info)

PDGELS(trans, m, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, work, lwork, info)

PCGELS(trans, m, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, work, lwork, info)

PZGELS(trans, m, n, nrhs, a, ia, ja, desca, b, ib, jb, descb, work, lwork, info)

Parameters

Parameter

Type

Scope

Description

Input/Output

trans

Character

Global

Specifies whether matrix A is transposed.

Input

m

Integer

Global

Number of rows to be operated, for example, the number of rows in a submatrix.

Input

n

Integer

Global

Number of columns to be operated, for example, the number of columns in a submatrix.

Input

nrhs

Integer

Global

Number of columns in the distributed submatrices sub(B) and X.

Input

a

  • A single-precision floating-point array for psgels
  • A double-precision floating-point array for pdgels
  • A single-precision complex array for pcgels
  • A double-precision complex array for pzgels

Local

Local part of distributed matrix A.

  • If m ≥ n, matrix A stores its QR factorization result (local part).
  • If m < n, matrix A stores its LQ factorization result (local part).

Input, output

ia

Integer

Global

Row indices of the submatrix in the global matrix.

Input

ja

Integer

Global

Column indices of the submatrix in the global matrix.

Input

desca

Integer array

Local and global

Descriptor of distributed matrix A.

Input

b

  • A single-precision floating-point array for psgels
  • A double-precision floating-point array for pdgels
  • A single-precision complex array for pcgels
  • A double-precision complex array for pzgels

Local

Local part of distributed matrix B.

  • When TRANS = 'N', B denotes the {IB:IB+M-1, JB:JB+NRHS-1} part of global matrix B. When TRANS = 'T', B denotes the B(IB:IB+N-1, JB: JB+NRHS-1) part.
  • When TRANS = 'N', the solution vectors are stored as columns of the N×NRHS RHS submatrix B. When TRANS = 'T', the solution vectors are stored as columns of the M×NRHS RHS submatrix B.

Input, output

ib

Integer

Global

Row indices of the submatrix in the global matrix.

Input

jb

Integer

Global

Column indices of the submatrix in the global matrix.

Input

descb

Integer array

Local and global

Descriptor of distributed matrix B.

Input

work

Real array

Local

When you exit, the minimum and optimal lwork values are returned.

Output

lwork

Integer

Local and global

Dimension of the work array.

Input

info

Integer

Global

  • 0: The execution is successful.
  • Smaller than 0: If the value of info is -i, the i-th parameter is invalid.
  • Greater than 0: An algorithm error occurs.

Output

Dependency

#include <kscalapack.h>

Example

    int izero=0;
    int ione=1;
    int myrank_mpi, nprocs_mpi;
    MPI_Init( &argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank_mpi);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs_mpi);
    char trans = 'N';
    int n = 8;       // (Global) Matrix size
    int nprow = 2;   // Number of row procs
    int npcol = 2;   // Number of column procs
    int nrhs = 5;
    int nb = 4;      // (Global) Block size
    char layout='R'; // Block cyclic, Row major processor mapping
    int lwork = 52;
    double work[52];
    printf("Usage: ./test matrix_size block_size nprocs_row nprocs_col\n");
 
    if(argc > 1) {
        n = atoi(argv[1]);
    }
    if(argc > 2) {
        nb = atoi(argv[2]);
    }
    if(argc > 3) {
        nprow = atoi(argv[3]);
    }
    if(argc > 4) {
        npcol = atoi(argv[4]);
    }
 
    assert(nprow * npcol == nprocs_mpi);
 
    // Initialize BLACS
    int iam, nprocs;
    int zero = 0;
    int ictxt, myrow, mycol;
    blacs_pinfo_(&iam, &nprocs) ; // BLACS rank and world size
    blacs_get_(&zero, &zero, &ictxt ); // -> Create context
    blacs_gridinit_(&ictxt, &layout, &nprow, &npcol ); // Context -> Initialize the grid
    blacs_gridinfo_(&ictxt, &nprow, &npcol, &myrow, &mycol ); // Context -> Context grid info (# procs row/col, current procs row/col)
 
    // Compute the size of the local matrices
    int mpA    = numroc_( &n, &nb, &myrow, &izero, &nprow ); // My proc -> row of local A
    int nqA    = numroc_( &n, &nb, &mycol, &izero, &npcol ); // My proc -> col of local A
    int mpB    = numroc_( &n, &nb, &myrow, &izero, &nprow ); // My proc -> row of local B
    int nqB    = numroc_( &n, &nb, &mycol, &izero, &npcol ); // My proc -> col of local B
    double *A;
    A = (double *)calloc(mpA*nqA,sizeof(double)) ;
    if (A==NULL){ printf("Error of memory allocation A on proc %dx%d\n",myrow,mycol); exit(0); }
    int k = 0;
    for (int j = 0; j < nqA; j++) { // local col
        int l_j = j / nb; // which block
        int x_j = j % nb; // where within that block
        int J   = (l_j * npcol + mycol) * nb + x_j; // global col
        for (int i = 0; i < mpA; i++) { // local row
            int l_i = i / nb; // which block
            int x_i = i % nb; // where within that block
            int I   = (l_i * nprow + myrow) * nb + x_i; // global row
            assert(I < n);
            assert(J < n);
            if(I == J) {
                A[k] = i + j + 1.5  +  (rand())%10;
            } else {
                A[k] = n + rand()% 10;
            }
            //printf("%d %d -> %d %d -> %f\n", i, j, I, J, A[k]);

            k++;
        }
    }

    double *B;
    B = (double *)calloc(mpB*nqB,sizeof(double)) ;
    if (B==NULL){ printf("Error of memory allocation B on proc %dx%d\n",myrow,mycol); exit(0); }
    k = 0;
    for (int j = 0; j < nqB; j++) { // local col
        int l_j = j / nb; // which block
        int x_j = j % nb; // where within that block
        int J   = (l_j * npcol + mycol) * nb + x_j; // global col
        for (int i = 0; i < mpB; i++) { // local row
            int l_i = i / nb; // which block
            int x_i = i % nb; // where within that block
            int I   = (l_i * nprow + myrow) * nb + x_i; // global row
            assert(I < n);
            assert(J < n);
            if(I == J) {
                B[k] = i + j + 1.5  +  (rand())%10;
            } else {
                B[k] = n + rand()% 10;
            }
            //printf("%d %d -> %d %d -> %f\n", i, j, I, J, B[k]);
             k++;
        }
    }
 
    int descA[9];
    int descB[9];
    int info=0;
    int lddA = mpA > 1 ? mpA : 1;
    int lddB = mpB > 1 ? mpB : 1;
    descinit_( descA,  &n, &n, &nb, &nb, &izero, &izero, &ictxt, &lddA, &info);
    descinit_( descB,  &n, &n, &nb, &nb, &izero, &izero, &ictxt, &lddB, &info);
    if(info != 0) {
        printf("Error in descinit, info = %d\n", info);
    }
    double MPIt1 = MPI_Wtime();
    printf("[%dx%d] Starting \n", myrow, mycol);
    pdgels_(&trans, &n, &n, &nrhs, A, &ione, &ione, descA, B, &ione, &ione, descB, work, &lwork, &info);
    if (info != 0) {
        printf("Error in caculate, info = %d\n", info);
    }
    double MPIt2 = MPI_Wtime();
    printf("[%dx%d] Done, time %e s.\n", myrow, mycol, MPIt2 - MPIt1);
    free(A);
    free(B);
    blacs_gridexit_(&ictxt);
    MPI_Finalize();


    //original matrix;
    /*
    beginA.dat
    4.50000 11.0000 17.0000 8.00000 11.0000 11.0000 17.0000 8.00000
    14.0000 8.50000 9.00000 17.0000 14.0000 13.0000 9.00000 17.0000
    15.0000 14.0000 7.50000 11.0000 15.0000 14.0000 10.0000 11.0000
    13.0000 10.0000 15.0000 13.5000 13.0000 10.0000 15.0000 14.0000
    11.0000 11.0000 17.0000 8.00000 4.50000 11.0000 17.0000 8.00000
    14.0000 13.0000 9.00000 17.0000 14.0000 8.50000 9.00000 17.0000
    15.0000 14.0000 10.0000 11.0000 15.0000 14.0000 7.50000 11.0000
    13.0000 10.0000 15.0000 14.0000 13.0000 10.0000 15.0000 13.5000
    */
    double *B
    /*
    beginB.dat
    3.40000 11.6000 12.6000 17.6000 10.6000 11.6000 12.6000 17.6000
    16.6000 13.4000 10.6000 15.6000 16.6000 18.6000 10.6000 15.6000
    12.6000 17.6000 9.40000 19.6000 12.6000 17.6000 12.6000 19.6000
    16.6000 19.6000 13.6000 11.4000 16.6000 19.6000 13.6000 12.6000
    10.6000 11.6000 12.6000 17.6000 3.40000 11.6000 12.6000 17.6000
    16.6000 18.6000 10.6000 15.6000 16.6000 13.4000 10.6000 15.6000
    12.6000 17.6000 12.6000 19.6000 12.6000 17.6000 9.40000 19.6000
    16.6000 19.6000 13.6000 12.6000 16.6000 19.6000 13.6000 11.4000
    */
    
    //print result
    /*
    endA.dat
    -36.3490 -31.6790 -32.1329 -35.4205 -35.1867 -31.6790 -32.1329 -35.4205
    0.342726 8.46706 11.1681 0.285094 4.34884 6.07544 11.1681 0.285094
    0.367206 0.114565 13.7477 3.02971 -0.816048 1.94287 13.2930 3.02971
    0.318245 0.245466 -0.182222 7.89794 5.36891 -0.658967 0.174397 7.86629
    0.269284 0.0337660 -0.199487 0.530335 5.89453 2.63366 -0.221784 0.0288312
    0.342726 0.111495 0.258678 -0.396869 0.304197 4.86159 0.325469 -0.0199092
    0.367206 0.114565 0.265211 0.293751 -0.0482493 -0.0280641 3.47962 0.00528634
    0.318245 0.245466 -0.182222 0.173244 -0.0665625 0.227001 -0.112593 -0.705509
    */
    /*
    endB.dat
    1.24237 1.03392 0.212867 -0.604674 0.134677 11.6000 12.6000 17.6000
    -0.507703 0.100760 -0.00174929 1.23343 -0.507703 18.6000 10.6000 15.6000
    0.141842 0.287368 0.919597 -0.117208 0.141842 17.6000 12.6000 19.6000
    0.167171 -0.231371 -0.0104530 1.43888 0.167171 19.6000 13.6000 12.6000
    0.134677 1.03392 0.212867 -0.604674 1.24237 11.6000 12.6000 17.6000
    -0.507703 -1.05480 -0.00174929 1.23343 -0.507703 13.4000 10.6000 15.6000
    0.141842 0.287368 -0.360403 -0.117208 0.141842 17.6000 9.40000 19.6000
    0.167171 -0.231371 -0.0104530 -0.961121 0.167171 19.6000 13.6000 11.4000
    */