鲲鹏性能定界工具使用示例
示例均为压缩包安装使用,安装后切换到工具目录下,以在鲲鹏920系列处理器的服务器上使用工具为例,展示如何进行服务器系统性能采集和分析,并对比采集数据,识别差异指标。
图1 整体流程


- 执行采集命令。
1./ksys collect -d 10 -i 1 -o /home/test/
- 采集时间指定为10秒,采样间隔指定为1秒,指定在“/home/test/”目录下生成JSON文件。
- 采集结束后会直接打印Summary数据,但不会保存Summary数据到JSON文件中。
返回信息片段如下:
\hotspot data collection is disabled. Refer to /home/ksys-xxx-Linux-aarch64/config.yaml for details Starting to collect data. You can press Ctrl+C to stop the task. Starting to parse data. You can press Ctrl+C stop the task. Progress: 2/2 | Sub-progress(spe data): 10/10. Starting to process and print data. You can press Ctrl+C to stop, and no results will be saved. ==================================================== Time : 2026/01/22 19:51:22 Version : DevKit xxx Model Name : HUAWEI Kunpeng 920 V200 7270Z Command : ksys collect -d 10 -i 1 -o /home/test/ ==================================================== ============================================================================================CPU Metrics============================================================================================= Common Microarchitecture Metrics Summary Data (System wide) +------+--------------+------+------+----------+----------+----------+----------+-----------+-----------+---------+ | IPC | INSTRUCTIONS | MPKI | BPKI | L1D MPKI | L1I MPKI | L2D MPKI | L2I MPKI | DTLB MPKI | ITLB MPKI | CPU-NUM | +------+--------------+------+------+----------+----------+----------+----------+-----------+-----------+---------+ | 0.51 | 152795353756 | 1.07 | 0.24 | 1.1 | 0.42 | 0.66 | 0.1 | 1.24 | 0.1 | 256 | +------+--------------+------+------+----------+----------+----------+----------+-----------+-----------+---------+ Topdown Summary Data (System wide) +----------------------------+-------+ | Metric | Value | +----------------------------+-------+ | Retiring(%) | 8.14 | | Frontend Bound(%) | 5.52 | | Fetch Latency Bound(%) | 3.37 | | Fetch Bandwidth Bound(%) | 2.15 | | Bad Speculation(%) | 1.56 | | Branch Mispredicts(%) | 0.16 | | Machine Clears(%) | 1.4 | | Backend Bound(%) | 84.78 | | Core Bound(%) | 48.03 | | Memory Bound(%) | 36.75 | | CPU-NUM | 256 | +----------------------------+-------+ OS Metrics Summary Data (System wide) +------------------+------------+-------------+---------+ | context-switches | migrations | page-faults | CPU-NUM | +------------------+------------+-------------+---------+ | 88074 | 186 | 26284 | 256 | +------------------+------------+-------------+---------+ INSTRUCTION Summary Data (System wide) +----------------------------------+-------+ | Metric | Value | +----------------------------------+-------+ | Memory(%) | 31.03 | | Load(%) | 31.03 | | Store(%) | 0.0 | | Scalar(%): | 42.27 | | Integer(%) | 42.26 | | Floating Point(%) | 0.01 | | Vector(%) | 0.02 | | Advanced SIMD(%) | 0.02 | | SVE(+loads/stores)(%): | 0.0 | | SME(retired)(%): | 0.0 | | Integer(%) | 0.0 | | Floating Point(%) | 0.0 | | Crypto(%) | 0.0 | | Branches(%) | 13.5 | | Immediate(%) | 13.1 | | Return(%) | 0.18 | | Indirect(%) | 0.22 | | Barriers(%) | 0.02 | | Instruction Synchronization(%) | 0.0 | | Data Synchronization(%) | 0.0 | | Data Memory(%) | 0.02 | | Not Retired(%) | 13.16 | +----------------------------------+-------+ Load_avg Summary Data (System wide) +--------------+--------------+---------------+ | recent 1 min | recent 5 min | recent 15 min | +--------------+--------------+---------------+ | 2.76 | 2.9 | 2.82 | +--------------+--------------+---------------+ Softirqs Summary Data (System wide) +----------+----------+---------+---------+---------+ | NET_TX/s | NET_RX/s | BLOCK/s | SCHED/s | CPU-NUM | +----------+----------+---------+---------+---------+ | 0 | 0 | 0 | 3 | 256 | +----------+----------+---------+---------+---------+ CPU_stat Summary Data (System wide) +----------------+--------------+-------------------+ | ctx_switches/s | interrupts/s | soft_interrupts/s | +----------------+--------------+-------------------+ | 7738.0 | 48489.0 | 1631.0 | +----------------+--------------+-------------------+ ... IO_info Summary Data (System wide) +--------------+-------+-------+--------+-------+---------+--------+-------+-------+ | BLOCK DEVICE | tps | rkB/s | wkB/s | dkB/s | areq-sz | aqu-sz | await | %util | +--------------+-------+-------+--------+-------+---------+--------+-------+-------+ | sdb | 5.0 | 0.0 | 195.21 | 0.0 | 10.68 | 0.0 | 0.08 | 0.24 | | sdb1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | sdb2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | sdb3 | 4.6 | 0.0 | 195.21 | 0.0 | 13.81 | 0.0 | 0.1 | 0.2 | | sda | 0.3 | 0.0 | 1.6 | 0.0 | 0.53 | 0.0 | 0.0 | 0.04 | | sda1 | 0.2 | 0.0 | 1.6 | 0.0 | 0.8 | 0.0 | 0.0 | 0.04 | | dm-0 | 10.11 | 0.0 | 196.41 | 0.0 | 4.66 | 0.0 | 0.03 | 0.2 | | dm-1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | dm-2 | 0.5 | 0.0 | 2.0 | 0.0 | 0.4 | 0.0 | 0.0 | 0.04 | +--------------+-------+-------+--------+-------+---------+--------+-------+-------+ ============================================================================================Net Metrics============================================================================================= Net_info Summary Data (System wide) +-------------+---------+---------+--------+--------+---------+---------+----------+---------+ | IFACE | rxpck/s | txpck/s | rxkB/s | txkB/s | rxcmp/s | txcmp/s | rxmcst/s | %ifutil | +-------------+---------+---------+--------+--------+---------+---------+----------+---------+ | eno1 | 21.61 | 30.31 | 1.64 | 3.61 | 0.0 | 0.0 | 0.0 | 0.0 | | eno2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | eno3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | eno4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | docker0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | vethca66367 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | +-------------+---------+---------+--------+--------+---------+---------+----------+---------+ Starting to save data. You can press Ctrl+C to stop, and no results will be saved. Data saved successfully at /home/test/2026_01_22_19_51_22_report.json Use the --verbose option for detailed metric explanations.
采集完成后会生成终端报告和JSON性能数据文件(“/home/test/2026_01_22_19_51_22_report.json”)。终端报告展示CPU、访存等多维度指标,可以观察到当前服务器上下文切换频率ctx_switches/s较高为7738.0 ,DDRC带宽较低在0MB/s - 30MB/s之间,说明当前环境执行了计算密集型业务。
- 对生成的性能数据文件进行分析,生成Excel报告。
1./ksys report -i /home/test/2026_01_22_19_51_22_report.json -o /home/test/
- 2026_01_22_19_51_22_report.json为通过ksys collect命令生成的JSON文件。
- 分析结束后会直接打印Summary数据,并且和时序数据一起保存至Excel文件中。
- 时序数据绘制成折线图或者面积图展示,各个图表之间时间线对齐。
返回信息片段如下:
Analyzing system data... Please wait. ==================================================== Time : 2026/01/22 19:52:37 Version : DevKit xxx Model Name : HUAWEI Kunpeng 920 V200 7270Z Command : ksys collect -d 10 -i 1 -o /home/test/ ==================================================== ============================================================================================CPU Metrics============================================================================================= Common Microarchitecture Metrics Summary Data (System wide) +------+--------------+------+------+----------+----------+----------+----------+-----------+-----------+---------+ | IPC | INSTRUCTIONS | MPKI | BPKI | L1D MPKI | L1I MPKI | L2D MPKI | L2I MPKI | DTLB MPKI | ITLB MPKI | CPU-NUM | +------+--------------+------+------+----------+----------+----------+----------+-----------+-----------+---------+ | 0.51 | 152795353756 | 1.07 | 0.24 | 1.1 | 0.42 | 0.66 | 0.1 | 1.24 | 0.1 | 256 | +------+--------------+------+------+----------+----------+----------+----------+-----------+-----------+---------+ Topdown Summary Data (System wide) +----------------------------+-------+ | Metric | Value | +----------------------------+-------+ | Retiring(%) | 8.14 | | Frontend Bound(%) | 5.52 | | Fetch Latency Bound(%) | 3.37 | | Fetch Bandwidth Bound(%) | 2.15 | | Bad Speculation(%) | 1.56 | | Branch Mispredicts(%) | 0.16 | | Machine Clears(%) | 1.4 | | Backend Bound(%) | 84.78 | | Core Bound(%) | 48.03 | | Memory Bound(%) | 36.75 | | CPU-NUM | 256 | +----------------------------+-------+ OS Metrics Summary Data (System wide) +------------------+------------+-------------+---------+ | context-switches | migrations | page-faults | CPU-NUM | +------------------+------------+-------------+---------+ | 88074 | 186 | 26284 | 256 | +------------------+------------+-------------+---------+ INSTRUCTION Summary Data (System wide) +----------------------------------+-------+ | Metric | Value | +----------------------------------+-------+ | Memory(%) | 31.03 | | Load(%) | 31.03 | | Store(%) | 0.0 | | Scalar(%): | 42.27 | | Integer(%) | 42.26 | | Floating Point(%) | 0.01 | | Vector(%) | 0.02 | | Advanced SIMD(%) | 0.02 | | SVE(+loads/stores)(%): | 0.0 | | SME(retired)(%): | 0.0 | | Integer(%) | 0.0 | | Floating Point(%) | 0.0 | | Crypto(%) | 0.0 | | Branches(%) | 13.5 | | Immediate(%) | 13.1 | | Return(%) | 0.18 | | Indirect(%) | 0.22 | | Barriers(%) | 0.02 | | Instruction Synchronization(%) | 0.0 | | Data Synchronization(%) | 0.0 | | Data Memory(%) | 0.02 | | Not Retired(%) | 13.16 | +----------------------------------+-------+ Load_avg Summary Data (System wide) +--------------+--------------+---------------+ | recent 1 min | recent 5 min | recent 15 min | +--------------+--------------+---------------+ | 2.76 | 2.9 | 2.82 | +--------------+--------------+---------------+ ... IO_info Summary Data (System wide) +--------------+-------+-------+--------+-------+---------+--------+-------+-------+ | BLOCK DEVICE | tps | rkB/s | wkB/s | dkB/s | areq-sz | aqu-sz | await | %util | +--------------+-------+-------+--------+-------+---------+--------+-------+-------+ | sdb | 5.0 | 0.0 | 195.21 | 0.0 | 10.68 | 0.0 | 0.08 | 0.24 | | sdb1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | sdb2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | sdb3 | 4.6 | 0.0 | 195.21 | 0.0 | 13.81 | 0.0 | 0.1 | 0.2 | | sda | 0.3 | 0.0 | 1.6 | 0.0 | 0.53 | 0.0 | 0.0 | 0.04 | | sda1 | 0.2 | 0.0 | 1.6 | 0.0 | 0.8 | 0.0 | 0.0 | 0.04 | | dm-0 | 10.11 | 0.0 | 196.41 | 0.0 | 4.66 | 0.0 | 0.03 | 0.2 | | dm-1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | dm-2 | 0.5 | 0.0 | 2.0 | 0.0 | 0.4 | 0.0 | 0.0 | 0.04 | +--------------+-------+-------+--------+-------+---------+--------+-------+-------+ ============================================================================================Net Metrics============================================================================================= Net_info Summary Data (System wide) +-------------+---------+---------+--------+--------+---------+---------+----------+---------+ | IFACE | rxpck/s | txpck/s | rxkB/s | txkB/s | rxcmp/s | txcmp/s | rxmcst/s | %ifutil | +-------------+---------+---------+--------+--------+---------+---------+----------+---------+ | eno1 | 21.61 | 30.31 | 1.64 | 3.61 | 0.0 | 0.0 | 0.0 | 0.0 | | eno2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | eno3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | eno4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | docker0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | | vethca66367 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | +-------------+---------+---------+--------+--------+---------+---------+----------+---------+ Save statistics and time series data to an Excel file. Please wait. The report has been saved to /home/test/2026_01_22_19_52_37_report.xlsx
分析任务完成后会生成终端报告和Excel文件(“/home/test/2026_01_22_19_52_37_report.xlsx”),其中Excel文件中包含多维度的时序数据(CPU维度、设备维度等),并且会生成对应的可视化时序图。
- 执行对比命令对比业务两次采集前后的性能差异,并生成对比报告。
1./ksys diff -i /home/test/2026_01_22_19_51_22_report.json /home/test/2026_01_22_19_53_45_report.json -o /home/test
2026_01_22_19_51_22_report.json、2026_01_22_19_53_45_report.json为通过ksys collect命令生成的JSON文件, 对比后的数据会保存在“/home/test/”目录下的Excel文件中。
返回信息片段如下:
============================================================================================System Info============================================================================================= System Architecture diff: +--------------+-------------------------------+-------------------------------+------+ | Metric | Before | After | Diff | +--------------+-------------------------------+-------------------------------+------+ | Cpu Type | Kunpeng920 high-performance | Kunpeng920 high-performance | N/A | | Model Name | HUAWEI Kunpeng 920 V200 7270Z | HUAWEI Kunpeng 920 V200 7270Z | N/A | | Vendor ID | HiSilicon | HiSilicon | N/A | | Hyper Thread | False | False | N/A | | CPU-NUM | 256 | 256 | N/A | +--------------+-------------------------------+-------------------------------+------+ ============================================================================================CPU Metrics============================================================================================= Common Microarchitecture Metrics diff: +--------------+--------------+--------------+---------+ | Metric | Before | After | Diff | +--------------+--------------+--------------+---------+ | IPC | 0.51 | 0.51 | +0.00% | | MPKI | 1.07 | 1.18 | +10.28% | | BPKI | 0.24 | 0.35 | +45.83% | | L1D MPKI | 1.1 | 1.19 | +8.18% | | L1I MPKI | 0.42 | 0.52 | +23.81% | | L2D MPKI | 0.66 | 0.74 | +12.12% | | L2I MPKI | 0.1 | 0.11 | +10.00% | | DTLB MPKI | 1.24 | 1.51 | +21.77% | | INSTRUCTIONS | 152795353756 | 151352935542 | -0.94% | | ITLB MPKI | 0.1 | 0.14 | +40.00% | +--------------+--------------+--------------+---------+ Topdown diff: +----------------------------+--------+-------+---------+ | Metric | Before | After | Diff | +----------------------------+--------+-------+---------+ | Retiring(%) | 8.14 | 8.12 | -0.25% | | Frontend Bound(%) | 5.52 | 5.88 | +6.52% | | Fetch Bandwidth Bound(%) | 2.15 | 2.14 | -0.47% | | Fetch Latency Bound(%) | 3.37 | 3.74 | +10.98% | | Bad Speculation(%) | 1.56 | 1.57 | +0.64% | | Branch Mispredicts(%) | 0.16 | 0.19 | +18.75% | | Machine Clears(%) | 1.4 | 1.38 | -1.43% | | Backend Bound(%) | 84.78 | 84.43 | -0.41% | | Core Bound(%) | 48.03 | 47.67 | -0.75% | | Memory Bound(%) | 36.75 | 36.76 | +0.03% | +----------------------------+--------+-------+---------+ OS Metrics diff: +------------------+--------+-------+---------+ | Metric | Before | After | Diff | +------------------+--------+-------+---------+ | context-switches | 88074 | 93718 | +6.41% | | migrations | 186 | 300 | +61.29% | | page-faults | 26284 | 42116 | +60.23% | +------------------+--------+-------+---------+ INSTRUCTION diff: +----------------------------------+--------+-------+---------+ | Metric | Before | After | Diff | +----------------------------------+--------+-------+---------+ | Memory(%) | 31.03 | 31.4 | +1.19% | | Load(%) | 31.03 | 31.4 | +1.19% | | Store(%) | 0.0 | 0.0 | +0.00% | | Scalar(%): | 42.27 | 40.46 | -4.28% | | Integer(%) | 42.26 | 40.45 | -4.28% | | Floating Point(%) | 0.01 | 0.01 | +0.00% | | Vector(%) | 0.02 | 0.03 | +50.00% | | Advanced SIMD(%) | 0.02 | 0.03 | +50.00% | | SVE(+loads/stores)(%): | 0.0 | 0.0 | +0.00% | | SME(retired)(%): | 0.0 | 0.0 | +0.00% | | Integer(%) | 0.0 | 0.0 | +0.00% | | Floating Point(%) | 0.0 | 0.0 | +0.00% | | Crypto(%) | 0.0 | 0.0 | +0.00% | | Branches(%) | 13.5 | 13.97 | +3.48% | | Immediate(%) | 13.1 | 13.42 | +2.44% | | Return(%) | 0.18 | 0.24 | +33.33% | | Indirect(%) | 0.22 | 0.31 | +40.91% | | Barriers(%) | 0.02 | 0.02 | +0.00% | | Instruction Synchronization(%) | 0.0 | 0.0 | +0.00% | | Data Synchronization(%) | 0.0 | 0.0 | +0.00% | | Data Memory(%) | 0.02 | 0.02 | +0.00% | | Not Retired(%) | 13.16 | 14.12 | +7.29% | +----------------------------------+--------+-------+---------+ Load_avg diff: +---------------+--------+-------+--------+ | Metric | Before | After | Diff | +---------------+--------+-------+--------+ | recent 1 min | 2.76 | 2.62 | -5.07% | | recent 5 min | 2.9 | 2.77 | -4.48% | | recent 15 min | 2.82 | 2.78 | -1.42% | +---------------+--------+-------+--------+ Softirqs diff: +----------+--------+-------+---------+ | Metric | Before | After | Diff | +----------+--------+-------+---------+ | NET_TX/s | 0 | 0 | +0.00% | | NET_RX/s | 0 | 0 | +0.00% | | BLOCK/s | 0 | 0 | +0.00% | | SCHED/s | 3 | 4 | +33.33% | +----------+--------+-------+---------+ CPU_stat diff: +-------------------+---------+---------+--------+ | Metric | Before | After | Diff | +-------------------+---------+---------+--------+ | ctx_switches/s | 7738.0 | 8353.0 | +7.95% | | interrupts/s | 48489.0 | 48774.0 | +0.59% | | soft_interrupts/s | 1631.0 | 1773.0 | +8.71% | +-------------------+---------+---------+--------+ ... Net_info Network Device vethca66367 diff: +----------+--------+-------+--------+ | Metric | Before | After | Diff | +----------+--------+-------+--------+ | rxpck/s | 0.0 | 0.0 | +0.00% | | txpck/s | 0.0 | 0.0 | +0.00% | | rxkB/s | 0.0 | 0.0 | +0.00% | | txkB/s | 0.0 | 0.0 | +0.00% | | rxcmp/s | 0.0 | 0.0 | +0.00% | | txcmp/s | 0.0 | 0.0 | +0.00% | | rxmcst/s | 0.0 | 0.0 | +0.00% | | %ifutil | 0.0 | 0.0 | +0.00% | +----------+--------+-------+--------+ ==============================================================================================Top diff============================================================================================== Top diff: ---------------------------------------------------------------------------------------------------------------- Note: At most 20 Top diffs are listed, please check the generated xlsx file for the rest of report. +-------------+------------------------------+------------------+----------+----------+----------+-------------+ | Table Group | Metric Type/Metric Device | Metric | Before | After | Diff | Diff(value) | +-------------+------------------------------+------------------+----------+----------+----------+-------------+ | HHA | HHA DEVICE hisi_sccl11_hha3 | rx_ops_num | 210977.4 | 342261.4 | +62.23% | 131284.0 | | HHA | HHA DEVICE hisi_sccl11_hha0 | rx_ops_num | 211826.6 | 340646.2 | +60.81% | 128819.6 | | HHA | HHA DEVICE hisi_sccl11_hha2 | rx_ops_num | 203417.4 | 332189.6 | +63.30% | 128772.2 | | HHA | HHA DEVICE hisi_sccl11_hha1 | rx_ops_num | 202916.8 | 329660.0 | +62.46% | 126743.2 | | HHA | HHA DEVICE hisi_sccl9_hha0 | rx_ops_num | 213699.4 | 326738.0 | +52.90% | 113038.6 | | OS | OS Metrics | page-faults | 26284 | 42116 | +60.23% | 15832 | | OS | OS Metrics | migrations | 186 | 300 | +61.29% | 114 | | IO_info | IO_info Summary | Total wkB/s | 592.03 | 50.03 | -91.55% | 542.0 | | IO_info | IO_info IO Device dm-0 | wkB/s | 196.41 | 11.6 | -94.09% | 184.81 | | IO_info | IO_info IO Device sdb | wkB/s | 195.21 | 10.8 | -94.47% | 184.41 | | IO_info | IO_info IO Device sdb3 | wkB/s | 195.21 | 10.8 | -94.47% | 184.41 | | IO_info | IO_info Summary | Total tps | 20.71 | 7.5 | -63.79% | 13.21 | | PA | PA DEVICE hisi_sicl2_pa0 | PA2Ring MB/s | 79.05 | 167.61 | +112.03% | 88.56 | | PA | PA DEVICE hisi_sicl2_pa0 | Ring2PA MB/s | 84.41 | 146.56 | +73.63% | 62.15 | | PA | PA DEVICE hisi_sicl10_pa0 | PA2Ring MB/s | 95.84 | 147.55 | +53.95% | 51.71 | | PA | PA DEVICE hisi_sicl2_pa0 | Ring2PA_lk0 MB/s | 48.48 | 84.76 | +74.83% | 36.28 | | PA | PA DEVICE hisi_sicl2_pa0 | Ring2PA_lk1 MB/s | 5.46 | 8.35 | +52.93% | 2.89 | | Net_info | Net_info Summary | Total txkB/s | 3.61 | 5.71 | +58.17% | 2.1 | | Net_info | Net_info Network Device eno1 | txkB/s | 3.61 | 5.71 | +58.17% | 2.1 | | Net_info | Net_info Summary | Total %ifutil | 0.0 | 0.01 | +inf% | 0.01 | +-------------+------------------------------+------------------+----------+----------+----------+-------------+ Data has been saved to /home/test/2026_01_22_19_54_59_diff.xlsx对比后的数据保存在“/home/test/2026_01_22_19_54_59_diff.xlsx”文件中。对比分析会比较两次采集的Summary数据,并在最后生成Top diff报告,用于比较差异最大的指标,本次对比中,可以发现两次采集名为hisi_sicl2_pa0的PA设备的带宽差距较大。
父主题: 鲲鹏性能定界工具KSYS