Constraints
Before using OmniAdvisor 2.0, understand its constraints.
- End-to-end tuning can be performed only for Spark tasks submitted using the spark-sql or spark-submit command. It is not available for tasks that are submitted using the --version, --kill, or --status command.
- Spark tasks are queried in the backend database based on their name attributes, file attributes (such as SQL statements, SQL file paths, or executable file paths, depending on the submitted command), and application parameter attributes. You are advised to specify the task configuration when submitting a task. The resource parameters (such as spark.executor.cores) are used for algorithm resource constraints. If the parameters are not specified, the default Spark parameter values are used.
- After the front-end interception function takes effect, OmniAdvisor redirects the contents of the stderr stream from the spark-sql command result to the stdout stream. As a result, all output is sent through stdout, leaving the stderr stream empty when redirection occurs. This behavior differs slightly from native Spark SQL.
- Due to the built-in expert rules of the algorithm, the expert tuning algorithm allows a maximum of nine rounds of tuning. However, it is not guaranteed that each workload can have nine rounds, as this depends on the bottleneck of the workload. If the tuning target is not still reached after the maximum number of tuning rounds, the algorithm stops and returns the result. You are advised to try other tuning methods.
- The transfer tuning algorithm depends on a large amount of historical tuning data. When the algorithm is used for the first time, it is normal if the configuration cannot be pushed.
- OmniAdvisor 2.0 calculates the execution duration (in seconds) of a Spark task as follows: Execution duration = Time when the latest job ends - Time when the earliest job is submitted.
- OmniAdvisor 2.0 requires the permission to use big data components such as Spark and HDFS.
- During tuning, the software accesses the Spark history server to obtain the execution information after the task is complete. When there are a large number of concurrent tasks, access may be frequent. Under the default configuration, the native Spark history server may exit due to high heap memory usage, garbage collection jitter, or out of memory (OOM) errors. Therefore, you need to adjust the parameters of the Spark history server based on the task concurrency.For example, in a test, the system processes 1,500 concurrent tasks every hour and each task takes 10 seconds. You can run the following commands to modify parameters. This allows the system to pass the 24/7 stability test.
export SPARK_DAEMON_MEMORY=12g export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=20 -XX:+UseG1GC"
- When the Spark history server periodically cleans up task event logs, trace collection may be blocked. To avoid this issue, add the following parameters to the Spark configuration file (spark-defaults.conf by default) as needed to reduce the event log cleanup frequency:
spark.history.fs.cleaner.enabled false spark.history.fs.cleaner.maxAge = 7d
Parent topic: Feature Overview