Rate This Document
Findability
Accuracy
Completeness
Readability

Importing Test Data to Test Performance

This section describes how to test performance after the Doris instruction optimization on the TPC-H test set.

  1. Download and install the TPC-H tool package.
    1. Copy the tpch-tools folder from the downloaded Doris source code to the /opt/tools/installed directory.
      1
      cp -r /opt/tools/installed/doris-2.1.2-rc04/tools/tpch-tools /opt/tools/installed
      
    2. Go to the tpch-tools folder.
      1
      cd /opt/tools/installed/tpch-tools
      
    3. Manually download the TPC-H dependency tool package, rename the tool package, and save the tool package to the specified directory.
      mv Downloaded_package TPC-H_Tools_v3.0.0new.zip
      mv TPC-H_Tools_v3.0.0new.zip /opt/tools/installed/tpch-tools/bin
    4. Modify the build-tpch-dbgen.sh file and comment out the wget download content.
      1. Open the file.
        vi bin/build-tpch-dbgen.sh
      2. Press i to enter the insert mode and modify the file as follows:
        #wget "https://doris-build-1308700295.cos.ap-beijing.myqcloud.com/tools/TPC-H_Tools_v3.0.0new.zip"
      3. Press Esc, type :wq!, and press Enter to save the file and exit.
    5. Generate the dbgen binary file in the TPC-H_Tools_v3.0.0/ directory.
      1
      sh bin/build-tpch-dbgen.sh
      
  2. Modify the configuration file conf/doris-cluster.conf of the test tool.
    1. Open the configuration file.
      1
      vi conf/doris-cluster.conf
      
    2. Press i to enter the insert mode and modify the following content of the file:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      # Any of FE host
      export FE_HOST='xx.xx.xx.xx'
      # http_port in fe.conf
      export FE_HTTP_PORT=8030
      # query_port in fe.conf
      export FE_QUERY_PORT=9030
      # Doris username
      export USER='root'
      # Doris password
      export PASSWORD=''
      # The database where TPC-H tables located
      export DB='tpch100G'
      
      • FE_HOST indicates the FE IP address, which is usually the IP address of the local physical machine, for example, 172.18.0.11/21.
      • FE_HTTP_PORT indicates the value of the http_port parameter of the FE. The value must be the same as that in fe.conf.
      • FE_QUERY_PORT indicates the value of the query_port parameter of the FE. The value must be the same as that in fe.conf.
      • USER indicates the username.
      • PASSWORD indicates the password. If it is not configured, leave it blank.
      • DB indicates the name of the database corresponding to TPC-H.
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
  3. Generate a TPC-H dataset.
    1
    sh bin/gen-tpch-data.sh -s 100 -c 40
    
    • -s indicates the size of the dataset, which can be set to 10, 500, or 1000, in GB.
    • -c specifies the number of threads used to generate data in parallel.
  4. Generate TPC-H data tables.
    1
    sh bin/create-tpch-tables.sh
    
  5. Import data.
    1
    sh bin/load-tpch-data.sh -c 40
    
  6. Run the test SQL statement to compare the performance of the open source doris_be and newly compiled doris_be.
    1
    sh bin/run-tpch-queries.sh -s 100
    
    • Figure 1 shows the performance result of the open source doris_be.
      Figure 1 Performance result of the open source doris_be.
    • Figure 2 shows the performance result of the optimized doris_be.
      Figure 2 Optimized doris_be performance result