Rate This Document
Findability
Accuracy
Completeness
Readability

Big Data Tuning

  1. Click next to System Profiler.

    Choose AI Tuning. The page for creating a task is displayed.

  2. Set task parameters. See Figure 1. Table 1 describes the parameters.

    Currently, AI tuning analysis is available only on CentOS 7.6 and openEuler 22.03 LTS.

    Figure 1 Creating an AI tuning analysis task (big data)
    Table 1 Parameters for creating an AI tuning analysis task (big data)

    Parameter

    Description

    Task Name

    Name of the task. The name must meet the following requirements:

    1. Contain only letters, digits, and underscores (_).
    2. Contain 1 to 64 characters.

    Application Type

    Type of the application to be tuned. Select Big data.

    Application Name

    Application to be tuned. The options are Hive, Spark, and Flink.

    Application Version

    Version of the application to be tuned.

    The Flink version ranges from 1.12 to 1.15, the Hive version is 3.1, and the Spark version is 3.1.

    root Password

    Password of the root user for the DevKit node. Ensure that you have root permissions for AI tuning. This parameter is mandatory when Hive is selected for Application Name.

    Master & Benchmark Node

    Node where the pressure test tool resides. You can click Add Node to add an agent node. This parameter is mandatory when Flink is selected for Application Name.

    Master Node

    Master node of the cluster. This parameter is mandatory when Hive or Spark is selected for Application Name.

    Application Executable File Path

    Path to the executable file of the to-be-tuned application, for example, /application/hive/bin.

    OmniOperator Directory

    OmniOperator directory. This parameter is available when Spark is selected for Application Name.

    Deployment Mode

    Application deployment mode. The options are YARN (default) and Standalone. This parameter is mandatory when Spark is selected for Application Name.

    Flink Master IP Address

    IP address of the master node in the Flink cluster. This parameter is mandatory when Flink is selected for Application Name.

    Application Port on Flink Master Node

    Enter the port of the Flink application on the master node. This parameter is mandatory when Flink is selected for Application Name.

    Pressure Test Tool

    Flink supports only HiBench, and Hive and Spark support only TPC-DS.

    Pressure Test Tool Version

    The HiBench version is 7.0 and the TPC-DS version is 3.0.

    Test Case

    Test case used by the pressure test tool.

    If you select Flink for Application Name, the options are identity (default), repartition, and wordcount.

    If you select Spark for Application Name, query1.sql is selected by default. You can select any test case from query1.sql to query99.sql.

    If you select Hive for Application Name, query1.sql is selected by default. You can select any test case from query1.sql to query99.sql.

    Tuning Metric

    Metric for application tuning.

    If you select Flink for Application Name, the options are throughput (default) and latency.

    If you select Spark for Application Name, the tuning metric is latency by default.

    If you select Hive for Application Name, the tuning metric is latency by default.

    JAVA_HOME

    JDK installation directory.

    Database

    Name of the database used for the pressure test. This parameter is mandatory when Hive or Spark is selected for Application Name.

    Pressure Test Tool Path

    Path to the pressure test tool, for example, /opt/Hibench-7.0.

    NOTE:

    You are advised to set the application path to a path such as /home or /opt. Do not set the application path to a system directory such as /, /dev, /sys, or /boot. Otherwise, system exceptions may occur.

    Throughput

    Throughput of the pressure test case. This parameter is mandatory when Flink is selected for Application Name. The options are 20K (default), 40K, 60K, 80K, 100K, 200K, 300K, 400K, 500K, 600K, 700K, 800K, 900K, 1000K, 2000K, 4000K, 6000K, 8000K and 10000K.

    Tuning Iterations

    Number of iterations for application tuning. The options are 20, 50, 100, 150 (default), and 200.

  3. Click Verify and Create.
  4. Click the task name to view the tuning statistics.

    Each row indicates one iteration of tuning. You can click Stop to stop the tuning.

    Figure 2 AI-based tuning analysis for a big data application
  5. Click Download Tuned Parameter Set to obtain the tuned database configuration.