Rate This Document
Findability
Accuracy
Completeness
Readability

NUMA Refined Analysis

Command Function

Obtains the refined DDR access, NUMA access bandwidth matrix, and hotspot memory area access information based on Arm SPE capabilities.

  • To collect the refined NUMA data, the server must support the Arm SPE collection. For details about how to configure SPE, see Configuring the SPE Environment.
  • You can import the tasks for which TAR packages have been generated to the WebUI for visualized viewing. For details, see contents about importing tasks in Task Management.

Syntax

devkit tuner numafast [-d <DURATION> | --duration=DURATION] [-i <INTERVAL>] | --interval=INTERVAL]

Parameter Description

Table 1 Parameter description

Parameter

Option

Description

-h/--help

None

Obtains help information.

-o/--outpath

file

Sets the name of the report data file. Do not use .tar as the end of the file name. If this parameter is not set, the default file name is Current_directory+numafast+timestamp.

-l/--log-level

0,1,2,3

Configures the log level. The default value is 1 (info).

  • 0: debug
  • 1: info
  • 2: warning
  • 3: error

-d/--duration

Num

Specifies the collection duration, in seconds. The value ranges from 2 to 172,800 seconds. If this parameter is not set, continuous collection is performed by default. You can press Ctrl+\ to cancel the task or press Ctrl+C to stop the collection and start analysis.

-i/--interval

Num

Specifies the sampling interval. The value ranges from 2 to 30 seconds. The default sampling interval is 5 seconds if this parameter is not specified.

-c/--count

Num

Sets the instruction collection interval for SPE. The value ranges from 1 to 4294967295. If this parameter is not specified, the default value 2048 is used.

-n/--num

Num

Sets the number of top N processes to be displayed. The value ranges from 1 to 30. The default value is 10.

--package

None

Sets whether to import data to the database and generate compressed packages in the specified output path.

-f/ --file

None

Sets to generate only report files, not to generate report data packages. This parameter is used with --package.

Example

Command:

devkit tuner numafast -i 2 -c 2048 -n 3 --package
  • In this command, the sampling interval is 2 seconds, the instruction collection interval for SPE is 2,048, the top 3 processes are displayed, and a report data package is generated in the default path.
  • If the -d parameter is not set, you can press Ctrl+\ to cancel the task or press Ctrl+C to stop the collection. If -d is set, the number of generated reports is affected by the -i parameter. For example, if -d 10 -i 2000 is set, it means that the collection duration is 10 seconds and the sampling interval is 2 seconds, as a result, five reports are generated.

Command output:

Enter analyze mode, please wait 5 seconds...

NUMAFAST ANALYSIS(Press Ctrl+C to exit)
==========================================================================================
1. System's numa score : 0.88
   Note: score = (max cost - real cost) / (max cost - min cost)
         real cost = SUM(0<=i,j<node number) numa distance(i, j) * access percentage(i, j)
         max cost = MAX(numa distance) , min cost = MIN(numa distance).
         This score is best at 1 and worst at 0.
         Format: traffic | numa distance | access percentage.

              DST_0               DST_1               DST_2               DST_3
SRC_0   0.63GB|10|38.24%    1.03GB|12|23.53%    0.10GB|20|8.82%     0.12GB|22|2.94%
SRC_1   0.00GB|12|0.00%     1.16GB|10|26.47%    0.00GB|22|0.00%     0.00GB|24|0.00%
SRC_2   0.00GB|20|0.00%     0.00GB|22|0.00%     0.00GB|10|0.00%     0.00GB|12|0.00%
SRC_3   0.00GB|22|0.00%     0.00GB|24|0.00%     0.00GB|12|0.00%     0.00GB|10|0.00%

==========================================================================================
2. Node detail information of memory access traffic:
   Note:RMA(Die): Access traffic across NUMA dies.
        RMA(Socket): Access traffic across NUMA sockets.
        LMA: Local access traffic on the NUMA node.
        %CPU: Number of occupied CPU cores. For example, 600% indicates that 6 CPU cores
        are occupied.

 NID RMA(Die) RMA(Skt)      LMA  %RMA MEM(all) MEM(free) %MEM   %CPU
   0   1.03GB   0.22GB   0.63GB  66.5  63.21GB    0.27GB 99.6  137.1
   1   0.00GB   0.00GB   1.16GB   0.0  63.93GB    0.77GB 98.8   93.2
   2   0.00GB   0.00GB   0.00GB   0.0  63.93GB   10.69GB 83.3   96.4
   3   0.00GB   0.00GB   0.00GB   0.0  62.93GB   48.75GB 22.5  109.6

==========================================================================================
3. Show top 3 process which sorted by memory access:
   Note:Process number: The actual number of SPE collection processes prevails. If the number of collection
        processes is less than the configured number, the actual number is displayed.
        MIGRATED X|Y: X indicates how many times threads of the process are migrated between
        NUMA nodes, and Y indicates the number of threads in the process.
        ACCESS: Percentage of the process access traffic to the total traffic.Top N
        sorting is based on this.

    PID  SCORE  ACCESS RMA(Die) RMA(Skt)      LMA  %RMA MIGRATED   %CPU  COMMAND
2083500   1.00  38.24%   0.00GB   0.00GB   0.63GB   0.0   0|2       nan  test_thread
3296840   0.97  32.35%   0.26GB   0.00GB   1.16GB  18.2   0|2       nan  python
3784179   0.61  29.41%   0.77GB   0.22GB   0.00GB 100.0   0|1       nan  gunicorn
==========================================================================================
  • Output report description
    The report consists of three parts: memory access matrix information, node details of memory access traffic, and process information sorted by memory access.
    1. Memory access matrix information

      The data consists of three parts: bandwidth traffic from SRC to DST, number of NUMA switchovers from SRC to DST, and the proportion of the traffic from SRC to DST to the total traffic.

    2. Node details of memory access traffic
      Table 2 Parameters of node details

      Parameter

      Description

      NID

      NUMA ID

      RMA(Die)

      Cross-NUMA access traffic

      RMA(Skt)

      Cross-NUMA socket access traffic

      LMA

      Local access traffic on NUMA nodes

      %RMA

      Percentage of remote access traffic

      MEM(all)

      Total memory size

      MEM(free)

      Available memory size

      %MEM

      Memory usage

      %CPU

      Number of occupied CPU cores. For example, 600% indicates that 6 CPU cores are occupied.

    3. Process information sorted by memory access
      Table 3 Parameters of process information

      Parameter

      Description

      PID

      Process ID

      SCORE

      NUMA score

      ACCESS

      Percentage of the process access traffic to the total traffic (determines the top N sorting)

      RMA(Die)

      Cross-NUMA access traffic

      RMA(Skt)

      Cross-NUMA socket access traffic

      LMA

      Local access traffic on NUMA nodes

      %RMA

      Percentage of remote access traffic

      MIGRATED

      Number of times that threads are migrated between NUMA nodes and number of threads in a process.

      %CPU

      CPU usage

      COMMAND

      Command line of a process