Rate This Document
Findability
Accuracy
Completeness
Readability

Executing Spark Engine Services

Run the spark-sql command to push down the Spark operator.

In this example, a non-partition table with 1 TB data of tpch is used as the test table. The test SQL statement is tpch-sql6.

Table 1 lists the related table information.

Table 1 Table information

Table Name

Format

Total

Occupied Space

lineitem

orc

5999989709

169.6 G

The procedure is as follows:

  1. Run the spark-sql command.
    1
    spark-sql --num-executors 10 --executor-cores 6 --driver-class-path "/usr/local/spark-plugin-jar/*" --jars "/usr/local/spark-plugin-jar/*" --conf "spark.executor.extraClassPath=./*"
    

  2. Run the sql6 statement.
    select
    sum(l_extendedprice * l_discount) as revenue
    from
    tpch_flat_orc_1000.lineitem
    where
    l_shipdate >= '1993-01-01'
    and l_shipdate < '1994-01-01'
    and l_discount between 0.06 - 0.01 and 0.06 + 0.01
    and l_quantity < 25;

    When the task is executed, the pushdown information is printed as follows:

    ndp.NdpPushDown: Selectivity: 0.09795918367346941
    ndp.NdpPushDown: Push down with [PushDownInfo(ListBuffer(FilterExeInfo((((((((isnotnull(l_quantity#5) AND isnotnull(l_discount#7)) AND isnotnull(l_shipdate#11)) AND (cast(l_shipdate#11 as date) >= 8401)) AND (cast(l_shipdate#11 as date) < 8766)) AND (l_discount#7 >= 0.05)) AND (l_discount#7 <= 0.07)) AND (l_quantity#5 < 25.0)),List(l_quantity#5, l_extendedprice#6, l_discount#7, l_shipdate#11))),ListBuffer(AggExeInfo(List(sum((l_extendedprice#6 * l_discount#7))),List(),List(sum#36))),None,Map(ceph1 -> ceph1, ceph2 -> ceph2, ceph3 -> ceph3))]

    The following figure contains the pushdown selectivity and operator information.

    Table 2 lists the parameters in the spark-sql command.

    Table 2 Operator pushdown parameters

    Parameter

    Recommended Value

    Description

    --num-executors

    10

    Number of started executors. The default value is 2.

    --executor-cores

    6

    Number of CPU cores used by each executor. The default value is 1.

    --driver-class-path

    "/usr/local/spark-plugin-jar/*"

    Path of the extra JAR packages to be transferred to the driver.

    --jars

    "/usr/local/spark-plugin-jar/*"

    JAR packages to be contained in the driver and executor class paths.

    --conf

    "spark.executor.extraClassPath=./*"

    Set Spark parameters.