Implementing Code Instrumentation
Due to differences in compilers, math libraries, and hardware, the running results on the Kunpeng platform and x86 platform are different. The following methods can be used to locate code that brings about differences and analyze the cause.
The following uses the code of WRF 3.4.1 as an example. Before analyzing the differences, determine the compilers, compilation options, and software stacks on the Kunpeng and x86 devices and ensure that the same WRF version and computing test cases are used.
Assume that the function input on the two platforms is the same. If the internal calculation of functions does not cause any difference, the function output on the two platforms is the same. Otherwise, code that causes differences exists in the functions.
Find the main compute path of the code, for example, solve_em.F in WRF 3.4.1. This module performs a large number of calculations on meteorological elements, which affects the final result. Then perform instrumentation analysis on all functions in solve_em.F.
- If the assumed condition is met and the function input is the same, implement code instrumentation by reading and writing files. For example:
- On the x86 platform, insert the write_args function before calling set_tiles to write the input values of set_tiles to a file.
! On x86, call write_args to write the input values to the file. CALL write_args(grid, ids, ide, jds, jde, ips, ipe, jps, jpe) CALL set_tiles(grid, ids, ide, jds, jde, ips, ipe, jps, jpe)
- After writing the file on the x86 platform, copy the file to the Kunpeng platform. Insert the read_args function in the same code position on Kunpeng to read x86 arguments.
! On Kunpeng, call read_args to read the file values and assign the values to the function input. CALL read_args(grid, ids, ide, jds, jde, ips, ipe, jps, jpe) CALL set_tiles(grid, ids, ide, jds, jde, ips, ipe, jps, jpe)
- On the x86 platform, insert the write_args function before calling set_tiles to write the input values of set_tiles to a file.
- Compare the output of the two platforms. Given that the function output may be a large array, it is not intuitive to compare the specific values directly. You can use the output MD5 values for comparison. Call the print_md5 function after set_tiles to calculate and print the MD5 values of the function output.
- On the x86 platform:
! Call write_args to write arguments to the file. CALL write_args(grid, ids, ide, jds, jde, ips, ipe, jps, jpe) CALL set_tiles(grid, ids, ide, jds, jde, ips, ipe, jps, jpe) ! Print the MD5 values. CALL print_md5(grid, ids, ide, jds, jde, ips, ipe, jps, jpe)
- On the Kunpeng platform:
! Call read_args to read the file values and assign the values to arguments. CALL read_args(grid, ids, ide, jds, jde, ips, ipe, jps, jpe) CALL set_tiles(grid, ids, ide, jds, jde, ips, ipe, jps, jpe) ! Print the MD5 values. CALL print_md5(grid, ids, ide, jds, jde, ips, ipe, jps, jpe)
- On the x86 platform:
- Compare the MD5 values of the set_tiles function output. The MD5 values of the two platforms are the same. Therefore, it can be determined that the set_tiles function does not cause differences.
- MD5 values of the set_tiles function output on Kunpeng:
ids md5sum is :4352d88a78aa39750bf70cd6f27bcaa5 ide md5sum is :78cf7f2f5161bd0bb9fd139e10624c07 jds md5sum is :4352d88a78aa39750bf70cd6f27bcaa5 jde md5sum is :78cf7f2f5161bd0bb9fd139e10624c07 ips md5sum is :4352d88a78aa39750bf70cd6f27bcaa5 ipe md5sum is :b15762fc4c227bbdb97385765fb475f4 jps md5sum is :4352d88a78aa39750bf70cd6f27bcaa5 jpe md5sum is :d2f2de0f320b64a33cdefe91e8632b54
- MD5 values of the set_tiles function output on x86:
ids md5sum is :4352d88a78aa39750bf70cd6f27bcaa5 ide md5sum is :78cf7f2f5161bd0bb9fd139e10624c07 jds md5sum is :4352d88a78aa39750bf70cd6f27bcaa5 jde md5sum is :78cf7f2f5161bd0bb9fd139e10624c07 ips md5sum is :4352d88a78aa39750bf70cd6f27bcaa5 ipe md5sum is :b15762fc4c227bbdb97385765fb475f4 jps md5sum is :4352d88a78aa39750bf70cd6f27bcaa5 jpe md5sum is :d2f2de0f320b64a33cdefe91e8632b54
- MD5 values of the set_tiles function output on Kunpeng:
- solve_em.F also calls the calc_p_rho_phi function. Use the same method to analyze the calc_p_rho_phi function and generate MD5 values of the function output.
- MD5 values of the calc_p_rho_phi function output on Kunpeng:
moist md5sum is :6b67192bd614cbcd0e46ed5d607479a0 num_3d_m md5sum is :7303f017fe369f9ce5af630da93ba867 config_flags%hypsometric_opt md5sum is :4352d88a78aa39750bf70cd6f27bcaa5 grid%al md5sum is :db28d22d61ccf7aaef53151c08bad13e grid%alb md5sum is :7df859bac212ad24233f0e35c347b79c grid%mu_2 md5sum is :78da02688276d78aebd7198ffea21ef6 grid%muts md5sum is :dbcaf66d2fb30baafbfab7f075e1325e grid%ph_2 md5sum is :9d5ab368e3398baa8b6eabf310e06515 grid%phb md5sum is :357f65254210d7544455ac35ee9b87b7 grid%p md5sum is :a9d0a0bd98710620b70c8e0b82b8cdf9 grid%pb md5sum is :7b44de932f45be71671722478a4b0c87 grid%t_2 md5sum is :f3d260a80c699900c6294ea356f0d64c p0 md5sum is :a6c5b219b5715388be8ac391f95602ab t0 md5sum is :2b37b3fa8052b64b82a21602e662e4ca grid%p_top md5sum is :96a8a83edaccefc72356a9a64a5484ac grid%znu md5sum is :65337863d0f5ad50c2492ba81f22d7bd grid%znw md5sum is :16b3ab5e2ec5e92d312a055b46ce4cc5 grid%dnw md5sum is :04fdee143ffda95f156e872762efbb17 grid%rdnw md5sum is :f838011b20bb121e63737fde1fd5dd3f grid%rdn md5sum is :a00dba2a5675a1a5ea4b85e97257e1bd config_flags%non_hydrostatic md5sum is :a54f0041a9e15b050f25c463f1db7449
- MD5 values of the calc_p_rho_phi function output on x86:
moist md5sum is :6b67192bd614cbcd0e46ed5d607479a0 num_3d_m md5sum is :7303f017fe369f9ce5af630da93ba867 config_flags%hypsometric_opt md5sum is :4352d88a78aa39750bf70cd6f27bcaa5 grid%al md5sum is :db28d22d61ccf7aaef53151c08bad13e grid%alb md5sum is :7df859bac212ad24233f0e35c347b79c grid%mu_2 md5sum is :78da02688276d78aebd7198ffea21ef6 grid%muts md5sum is :dbcaf66d2fb30baafbfab7f075e1325e grid%ph_2 md5sum is :9d5ab368e3398baa8b6eabf310e06515 grid%phb md5sum is :357f65254210d7544455ac35ee9b87b7 grid%p md5sum is :5e0897900e4534f9ad7d70ce616d2954 grid%pb md5sum is :7b44de932f45be71671722478a4b0c87 grid%t_2 md5sum is :f3d260a80c699900c6294ea356f0d64c p0 md5sum is :a6c5b219b5715388be8ac391f95602ab t0 md5sum is :2b37b3fa8052b64b82a21602e662e4ca grid%p_top md5sum is :96a8a83edaccefc72356a9a64a5484ac grid%znu md5sum is :65337863d0f5ad50c2492ba81f22d7bd grid%znw md5sum is :16b3ab5e2ec5e92d312a055b46ce4cc5 grid%dnw md5sum is :04fdee143ffda95f156e872762efbb17 grid%rdnw md5sum is :f838011b20bb121e63737fde1fd5dd3f grid%rdn md5sum is :a00dba2a5675a1a5ea4b85e97257e1bd config_flags%non_hydrostatic md5sum is :a54f0041a9e15b050f25c463f1db7449
- MD5 values of the calc_p_rho_phi function output on Kunpeng:
It shows that the grid%p values of the calc_p_rho_phi function on the two platforms are different. Use the same method to perform instrumentation analysis on the calc_p_rho_phi function that involves the grid%p variable. It is found that the vpow function is called, causing the inconsistency of the grid%p variable.
The vpow function is only used for pow calculation, which is related to the math library called by the program. After the math library is replaced with KML on the Kunpeng platform, the output of the calc_p_rho_phi function is the same as that on the x86 platform.
Traverse all target functions and use the instrumentation method to complete the WRF result difference analysis.