Roofline插桩指导

插桩说明

Roofline分析支持手动region插桩,mode参数指定为region时,支持对应用中已经插桩的region块进行分别采集,实现function/loop级别的定量数据分析,该能力需要用户手动对待分析的源码进行插桩,并重新编译。

region插桩与分析:
  1. 在源代码中插入Roofline Events API。
    • 以RPM安装鲲鹏DevKit命令行工具为例,默认安装路径为/usr/local/devkit,后续以该目录为例。
    • Roofline Events API定义在鲲鹏DevKit命令行工具安装路径下,即/usr/local/devkit/tuner/include/roofline_events.h或/usr/local/devkit/tuner/include/roofline_events.mod。
    • roofline_events.h用于C/C++程序,roofline_events.mod用于Fortran程序。
  2. 使用新的编译标志重新编译应用程序:
    • C/C++:-DROOFLINE_EVENTS -I /usr/local/devkit/tuner/include -L/usr/local/devkit/tuner/lib -lrfevents
    • Fortran:-I /usr/local/devkit/tuner/include -L/usr/local/devkit/tuner/lib -lrfevents
  3. 需保证运行时动态库寻址路径包含/usr/local/devkit/tuner/lib,比如在LD_LIBRARY_PATH中增加/usr/local/devkit/tuner/lib路径。
  4. 进行Roofline分析时选择region模式,可使用命令行工具“devkit tuner roofline -m region <应用 应用参数>”对插桩后编译生成的应用进行采集。

Roofline Events API介绍

数据是按线程收集,因此需注意以下规则:

  • 在串行代码中initialize/finalize(例如主线程)。
  • 如果需要分析所有线程数据,start/stop API需要放置在并行代码中。
  • 支持多个region,但不支持region之间嵌套,即同一个region的start/stop API需要成对且region之间不交错。
  • region名称用于匹配线程之间的region数据。
  • 以ROOFLINE_EVENTS开头的接口可以通过ROOFLINE_EVENTS编译选项进行开启和关闭,宏定义能力适用于C/C++。
  • 以perf_roofline_events结尾的接口适用于C/C++/Fortran,不支持编译选项开关。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#ifdef ROOFLINE_EVENTS
#define ROOFLINE_EVENTS_INIT init_perf_roofline_events()
#define ROOFLINE_EVENTS_START_REGION(region_label) start_perf_roofline_events(region_label)
#define ROOFLINE_EVENTS_STOP_REGION(region_label) stop_perf_roofline_events(region_label)
#define ROOFLINE_EVENTS_FINALIZE finalize_perf_roofline_events()
#else
#define ROOFLINE_EVENTS_INIT
#define ROOFLINE_EVENTS_START_REGION(region_label)
#define ROOFLINE_EVENTS_STOP_REGION(region_label)
#define ROOFLINE_EVENTS_FINALIZE
#endif

#ifdef __cplusplus
extern "C" {
#endif
// read system counters -> init
// should be called in serial code before start_perf_roofline_events
extern void init_perf_roofline_events(void) __attribute__((visibility("default")));
// start roofline events for current thread and provided region
// should be called in parallel code
extern void start_perf_roofline_events(const char* region) __attribute__((visibility("default")));
// stop roofline events for current thread and provided region
// should be called in parallel code
extern void stop_perf_roofline_events(const char* region) __attribute__((visibility("default")));
// summarize data for all regions
// should be called in serial code after stop_perf_roofline_events for all regions/threads
extern void finalize_perf_roofline_events(void) __attribute__((visibility("default")));
#ifdef __cplusplus
}
#endif

插桩示例