Tuning Process Flow

When using a test tool to test a Spark machine learning algorithm, you can use a system monitoring tool (for example, nmon) to monitor the test, determine the load (CPU computing, I/O activities, or coexistence of CPU computing and I/O activities) during the test, and identify bottlenecks for tuning.
Calculate empirical values of executor parameters to fully utilize CPU and memory resources when tasks are running. As indicated by previous test results, the NUMA feature of Kunpeng processors and the NUMA awareness feature of Yarn deliver optimal performance when the containers are evenly allocated to all NUMA nodes. The number of executors can be calculated based on the number of compute nodes and the requirements for even distribution of containers.
If CPU and memory resources cannot be fully utilized at the same time, preferentially fully utilize CPU resources first and then determine whether to increase the memory based on the GC logs. In memory-intensive scenarios, fully utilize memory resources and reserve certain CPU resource margins. Generally, the ratio of executor memory to executor cores is the same as the ratio of total memory to total cores.

Figure 1 General tuning procedure

Parent topic: Overview