Using the Background Spark Task Tuning Function

You can run the python tuning.pyc command to enable background task tuning. After a tuning command is executed, the server tunes Spark tasks in the background.

Command Function

On the management node, you can run this command to start a tuning task and specify the tuning target, retest mode, and tuning method.

Syntax

python tuning.pyc [-h] -l LOAD_ID -r {hijacking,backend} [-t {iterative,expert,transfer,native}]

Parameter Description

Item	Description
-h or --help	Optional. Displays help information about a command. The help information contains the command usage, parameter definition, and additional description.
-l or --load-id	Mandatory. Indicates the load ID queried in the loads table.
-r or --retest-way	Mandatory. Indicates the retest mode. Retests ensure the result reliability. The options are: hijacking: indicates foreground retests. The configuration is delivered when a user submits a task. backend: indicates background retests. The configuration is automatically triggered by OmniAdvisor.
-t or --tuning-method	Mandatory when -r or --retest-way is set to hijacking, and optional when -r or --retest-way is set to backend. Indicates the tuning method. iterative: AI-driven iterative tuning, which uses AI algorithms to search the parameter space for the optimal application parameters. expert: expert rule–based tuning, which optimizes application parameters by diagnosing and matching load performance bottlenecks. transfer: transfer generalization tuning, which searches for historical similar loads and reuses their tuning experience to obtain application parameters with better performance. native: replaces native Spark operators with C++ Native operators to accelerate performance. NOTICE: To use the native tuning method, you need to deploy the OmniOperator component on each node in the cluster in advance. Currently, HDFS deployment is not supported.

Example of Usage

Display the command usage, parameter definition, and additional description.
```
python tuning.pyc --help
```
Optimize the load whose load-id is 1 based on expert rules and check the optimization effect through background retests.
```
python tuning.pyc -l 1 -r backend -t expert
```

If the -t or --tuning-method option is not used to specify the tuning method and the retest mode is background, the tuning.strategy configuration item in the common_config.ini configuration file of OmniAdvisor 2.0 is used for tuning by default.
```
python tuning.pyc -l 1 -r backend
```

Example of Tuning

Construct test data.

spark-sql --master yarn --deploy-mode client --driver-cores 8 --driver-memory 20G --num-executors 36 --executor-cores 8 --executor-memory 29g -e "CREATE DATABASE IF NOT EXISTS omnitest; CREATE TABLE IF NOT EXISTS omnitest.employee (id INT,name STRING)ROW FORMAT DELIMITED FIELDS TERMINATED BY ','STORED AS TEXTFILE;"

Intercept the load.
Intercept the user load to obtain the load and related information for subsequent tuning. For details, see Using the Foreground Spark Task Interception Function.
```
export enable_omniadvisor=true
spark-sql --master yarn --deploy-mode client --driver-cores 8 --driver-memory 20G --num-executors 36 --executor-cores 8 --executor-memory 29g -e "SELECT * FROM omnitest.employee;"
```
After a load is intercepted and recorded in the database, any subsequent command for that same load is also intercepted, and the current user-defined task configuration is replaced with the optimal configuration recommended by the system (the current user-defined task configuration is used in the initial status of OmniAdvisor 2.0). If the recommended configuration fails to be executed, the system rolls back to the default configuration.

Query the load and related information.

After a load is executed and intercepted, you can query its tuning information in the database. OmniAdvisor Database Tables describes the database table structure.

        
             select id, rounds, load_id, method from omniadvisor_tuning_record;

In this example, load_id is 7. The load_id will be used in 4. Replace load_id with the actual one.

Start tuning.
- Example 1
  Use the expert rule–based tuning method to tune the load whose load_id is 7. During the tuning, perform tests again in the background. The number of tests is equal to the value of tuning.retest.times in the $OMNIADVISOR_HOME/omniruntime-omniadvisor-2.0.0/config/common_config.ini file.
```
python tuning.pyc -l 7 -r backend -t expert
```
- Example 2
  Determine the current tuning method based on the default tuning policy and historical tuning records in OmniAdvisor 2.0.
```
python tuning.pyc -l 7 -r backend
```
After the tuning is complete, the optimal load configuration is updated so that it can be used in 2.

Parent topic: Using OmniAdvisor 2.0