Configuring Client Parameters
Purpose
When using Spark, you can set some client parameters. These parameters affect the execution efficiency and resource usage of Spark jobs.
Procedure
For details about the names, recommended values, and descriptions of the client parameters, see Table 1.
|
Parameter |
Recommended Value |
Description |
|---|---|---|
|
spark.shuffle.compress |
True |
Enables compression of shuffle data in the shuffle process to reduce network transfer overhead and improve overall task execution efficiency. |
|
spark.rdd.compress |
True |
Reduces the size of cached RDD data. Compressing cached data minimizes memory consumption, thereby enhancing data caching efficiency. |
|
spark.io.compression.codec |
Snappy |
Codec for internal data such as RDD data and shuffle output. Snappy is fast with low memory/CPU overhead. |
|
spark.shuffle.spill.compress |
True |
Compresses intermediate results when they are spilled to local disks to save time. Compresses data written to disks to accelerate the disk I/O and reduce the disk space usage. |
|
spark.locality.wait |
10s |
Sets the locality wait time, enabling Spark to prioritize task scheduling on nodes where the data resides, minimizing cross-node network transfers. |
- Method 1: The client parameter configuration is stored in spark-defaults.conf, under the path $SPARK_HOME/conf/spark-defaults.conf. This file provides configuration values for Spark. The values are automatically applied when Spark jobs are executed.

- Open the file.
vi $SPARK_HOME/conf/spark-defaults.conf
- Press i to enter the insert mode and modify the parameter values.
# Enable data compression during the shuffle process. spark.shuffle.compress true # Enable data compression when RDDs are persisted to disks. spark.rdd.compress true # Set Snappy as the codec for I/O compression. spark.io.compression.codec snappy # Enable compression for data spilled to disks during the shuffle process. spark.shuffle.spill.compress true # Configures a 10s wait timeout for data locality during task scheduling. spark.locality.wait 10s
- Press Esc, type :wq!, and press Enter to save the file and exit.
- Open the file.
- Method 2: Use spark-submit on the CLI. The command is as follows:
spark-submit \ --conf "spark.shuffle.compress=true" \ --conf "spark.rdd.compress=true" \ --conf "spark.io.compression.codec=snappy" \ --conf "spark.shuffle.spill.compress=true" \ --conf "spark.locality.wait=10s" \ your_application.py