Rate This Document
Findability
Accuracy
Completeness
Readability

Executing Spark Services

This section uses a non-partition TPC-H table with 1 TB data as the test table and the test SQL statement tpch-sql6 to execute Spark 3.1.1 services.

Use the spark-sql command for operator pushdown on Spark.

In this example, a non-partition TPC-H table with 1 TB data is used as the test table. The test SQL statement is tpch-sql6.

Table 1 lists the related table information.

Table 1 Table information

Table Name

Format

Total Number of Rows

Occupied Space

lineitem

orc

5999989709

169.6 GB

Operations on Spark 3.1.1

Logs of the INFO level are not printed on Spark 3.1.1 (on Yarn). Therefore, log redirection is required for Spark 3.1.1.

  1. Define the log file log4j.properties in /usr/local/spark/conf.
    1. Create a log file log4j.properties.
      1
      2
      cd /usr/local/spark/conf
      vi log4j.properties
      
    2. Press i to enter the insert mode and add the following content to the file:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      log4j.rootCategory=INFO, FILE
      log4j.appender.console=org.apache.log4j.ConsoleAppender
      log4j.appender.console.target=System.err
      log4j.appender.console.layout=org.apache.log4j.PatternLayout
      log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
       
      log4j.logger.org.apache.spark.sql.execution=DEBUG
      log4j.logger.org.apache.spark.repl.Main=INFO
       
      log4j.appender.FILE=org.apache.log4j.FileAppender
      log4j.appender.FILE.file=/usr/local/spark/logs/file.log
      log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
       
      log4j.appender.FILE.layout.ConversionPattern=%m%n
      
    3. Press Esc, type :wq!, and press Enter to save the file and exit.
  2. Change the value of log4j.appender.FILE.file in the log4j.properties file to the custom directory and file name.
  3. Add --driver-java-options -Dlog4j.configuration=file:/usr/local/spark/conf/log4j.properties when running the spark-sql command, for example:
    1
    /usr/local/spark/bin/spark-sql  --driver-class-path '/opt/boostkit/*' --jars '/opt/boostkit/*' --conf 'spark.executor.extraClassPath=./*' --name tpch_query6.sql --driver-memory 50G --driver-java-options -Dlog4j.configuration=file:/usr/local/spark/conf/log4j.properties --executor-memory 32G --num-executors 30 --executor-cores 18
    
  4. Run the sql6 statement.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    select
    sum(l_extendedprice * l_discount) as revenue
    from
    tpch_flat_orc_1000.lineitem
    where
    l_shipdate >= '1993-01-01'
    and l_shipdate < '1994-01-01'
    and l_discount between 0.06 - 0.01 and 0.06 + 0.01
    and l_quantity < 25;
    

    When the task is executed, the pushdown information is printed as follows:

    1
    2
    Selectivity: 0.014160436451808448
    Push down with [PushDownInfo(ListBuffer(FilterExeInfo((((((((isnotnull(l_shipdate#18) AND isnotnull(l_discount#14)) AND isnotnull(l_quantity#12)) AND (l_shipdate#18 >= 8401)) AND (l_shipdate#18 < 8766)) AND (l_discount#14 >= 0.05)) AND (l_discount#14 <= 0.07)) AND (l_quantity#12 < 25.0)),List(l_quantity#12, l_extendedprice#13, l_discount#14, l_shipdate#18))),ListBuffer(AggExeInfo(List(sum((l_extendedprice#13 * l_discount#14))),List(),List(sum#43))),None,Map(server1 -> server1, agent2 -> agent2, agent1 -> agent1))]
    

    The output contains the pushdown selectivity and operator information.

    Table 2 lists the parameters in the spark-sql command.

    Table 2 OmniData parameters

    Parameter

    Recommended Value

    Description

    --driver-memory

    50G

    Available memory of the driver.

    --executor-memory

    32G

    Available memory of the executor.

    --num-executors

    30

    Number of started executors. The default value is 2.

    --executor-cores

    18

    Number of CPU cores used by each executor. The default value is 1.

    --driver-class-path

    "/opt/boostkit/*"

    Path of the extra JAR packages to be transferred to the driver.

    --jars

    "/opt/boostkit/*"

    JAR packages to be contained in the driver and executor class paths.

    --conf

    "spark.executor.extraClassPath=./*"

    Sets Spark parameters.

    --driver-java-options -Dlog4j.configuration

    file:/usr/local/spark/conf/log4j.properties

    Log4j log configuration file path.