我要评分
获取效率
正确性
完整性
易理解

OmniShuffle Configuration File

spark.conf

Table 1 Default configurations

Parameter

Value Range and Default Value

Description

spark.executor.extraClassPath

$OCK_HOME/jars/*:.

Path of the OmniShuffle JAR package. Change $OCK_HOME to the actual OmniShuffle installation path.

spark.driver.extraClassPath

$OCK_HOME/jars/*:.

Path of the OmniShuffle JAR package. Change $OCK_HOME to the actual OmniShuffle installation path.

spark.driver.extraJavaOptions

-Djava.library.path=$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/openssl:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/ucx:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/ucx/ucx:$OCK_HOME/ock/ucache/23.0.0/linux-aarch64/lib/datakit:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/mf -Dlog4j.configuration=/usr/local/spark/conf/log4j.properties -Xms8g -XX:+UseParallelGC

JVM option string transferred to the driver. Change $OCK_HOME to the actual OmniShuffle installation path.

spark.executor.extraJavaOptions

-Djava.library.path=$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/openssl:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/ucx:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/ucx/ucx:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/datakit:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/mf -Xms8g -XX:+UseParallelGC -XX:ParallelGCThreads=6 -XX:ErrorFile=/tmp/hs_err_pid%p.log

JVM option string transferred to the Executor. Change $OCK_HOME to the actual OmniShuffle installation path.

spark.driver.extraLibraryPath

$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/openssl:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/ucx:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/ucx/ucx:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/datakit:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/mf:.

Path of the library used when the JVM of the driver is started. Change $OCK_HOME to the actual OmniShuffle installation path.

spark.executor.extraLibraryPath

$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/openssl:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/ucx:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/common/ucx/ucx:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/datakit:$OCK_HOME/ucache/23.0.0/linux-aarch64/lib/mf:.

Path of the library used when the JVM of the Executor is started. Change $OCK_HOME to the actual OmniShuffle installation path.

spark.shuffle.manager

  • Options: org.apache.spark.shuffle.ock.OCKShuffleManager/

    OckColumnarShuffleManager

  • Default value: org.apache.spark.shuffle.ock.OCKShuffleManager
NOTE:

Class path of OCK Shuffle Manager.

spark.blacklist.enabled

  • Value: true or false
  • Default value: false

This parameter is provided by Spark. Set this parameter to true at the job level to enable the blocklist mechanism for fault recovery.

spark.blacklist.application.fetchFailure.enabled

  • Value: true or false
  • Default value: false

This parameter is provided by Spark. Set this parameter to true at the job level so that Spark will blocklist the executor immediately when a fetch failure occurs.

spark.files.fetchFailure.unRegisterOutputOnHost

  • Value: true or false
  • Default value: false

This parameter is provided by Spark. Set this parameter to true at the job level so that Spark unregisters outputs of existing map tasks when a fetch failure occurs.

spark.yarn.blacklist.executor.launch.blacklisting.enabled

  • Value: true or false
  • Default value: false

This parameter is provided by Spark for Yarn. Set this parameter to true at the job level to enable blocklisting of nodes having YARN resource allocation problems.

spark.shuffle.service.enabled

  • Value: true or false
  • Default value: false

This parameter is provided by Spark. Set this parameter to false at the job level to disable the Spark external shuffle service.

spark.shuffle.ock.home

$OCK_HOME

Location of the OmniShuffle home folder.

spark.shuffle.ock.isIsolated

  • Value: true or false
  • Default: true

Indicates whether to enable the app resource isolation function of OmniShuffle. This parameter must be used together with the OmniShuffle server parameters.

spark.shuffle.ock.scheduler.excludeUnavailableNodes

  • Value: true or false
  • Default: true

Indicates whether to enable blocklisting of invalid nodes for Shuffle Manager.

spark.shuffle.ock.removeShuffleDataAfterJobFinished

  • Value: true or false
  • Default value: false

(Tuning item) Indicates whether to release the shuffle file after a job is complete. In most scenarios, set this parameter to false. Set this parameter to true only when you confirm that shuffle data is not reused across jobs.

spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version

2

This is a native Hadoop configuration that is used to optimize performance and reduce the time required for shuffle file output.

spark.shuffle.ock.aggregateFlags

  • Value: true or false
  • Default: true

Indicates whether to perform aggregation.

spark.broadcast.ock.manager

  • Value: true or false
  • Default value: false

Indicates whether to enable OmniShuffle broadcasting.

spark.broadcast.ock.robustness

  • Value: true or false
  • Default value: false

Indicates whether to enable OmniShuffle broadcasting reliability.

  • true: Broadcast variables are remotely written to two nodes during initialization.
  • false: Broadcast variables are written to only one node.

spark.broadcast.ock.ockThresholdInMb

100

Threshold for the broadcast variable type, in MB. When the broadcast variable exceeds this threshold and spark.broadcast.ock.manager is set to true, OmniShuffle broadcast variables are used. Otherwise, native broadcast variables are used.

spark.sql.ock.autoConfig.enable

  • Value: true or false
  • Default value: false

Enables OmniShuffle BoostTuning for parallelism degree adjustment.

spark.sql.adaptive.enabled

  • Value: true or false
  • Default value: false

Enables the native AQE function of Spark. Currently, OmniShuffle BoostTuning works only on Spark SQL jobs for which AQE is enabled. Set this parameter to true.

spark.sql.ock.autoConfig.history

  • Value: true or false
  • Default: true

Indicates whether to make OmniShuffle BoostTuning adjust the parallelism degree based on historical data. Retain the default value in most scenarios.

spark.sql.ock.autoConfig.sample

  • Value: true or false
  • Default: true

Indicates whether to make OmniShuffle BoostTuning adjust the parallelism degree based on samples. If neither sample nor history is chosen, the parallelism degree will not change. Retain the default value in most scenarios.

spark.sql.ock.autoConfig.historyRelation

  • Value: true or false
  • Default: true

(Tuning item) Indicates whether OmniShuffle BoostTuning uses the relational historical record algorithm. Retain the default value in most scenarios.

spark.sql.ock.autoConfig.globalRuntimePartition

  • Value: true or false
  • Default value: false

(Tuning item) Indicates whether OmniShuffle BoostTuning infers the parallelism degree based on cluster running resources when AQE is disabled and there is no historical data. Retain the default value in most scenarios.

spark.sql.ock.autoConfig.skipSample

  • Value: true or false
  • Default: true

(Tuning item) Indicates whether OmniShuffle BoostTuning skips the sampling process when the upstream data volume is small. Retain the default value in most scenarios.

spark.sql.ock.autoConfig.skipSampleThreshold

  • Value: In the format of the standard Spark data volume, for example, 100M and 1K.
  • Default: 10G

(Tuning item) Threshold for OmniShuffle BoostTuning to skip the sampling process when the upstream data volume is small. Retain the default value in most scenarios.

spark.sql.ock.autoConfig.historyStrategy

  • Value: default, override, readOnly, or isolate
  • Default: default

Policy for OmniShuffle BoostTuning to adjust the parallelism degree based on historical data. Retain the default value in most scenarios.

  • default: If no historical data exists, online inference is used. If historical data exists, historical data is used.
  • override: New jobs will read and overwrite historical data and generate the unique historical data.
  • readOnly: Data of new jobs will not be persisted, that is, will not be written to the historical service.
  • isolate: New jobs neither read nor generate historical data.

spark.sql.ock.autoConfig.samplePartitionFraction

  • Value: A decimal number less than 1 and greater than 0
  • Default: 0.01

(Tuning item) Data volume sampling ratio in the OmniShuffle BoostTuning sampling process. Retain the default value in most scenarios.

spark.sql.ock.autoConfig.sampleRDDFraction

  • Value: A decimal number less than 1 and greater than 0
  • Default: 0.01

(Tuning item) Data partition sampling ratio in the OmniShuffle BoostTuning sampling process. Retain the default value in most scenarios.

spark.sql.ock.autoConfig.partitionRatio

  • Value: Greater than 0
  • Default: 3.0

(Tuning item) Coefficient used by OmniShuffle BoostTuning to calculate shuffle partitions. Retain the default value in most scenarios.

spark.ock.decimal.optimize

  • Value: true or false
  • Default value: false

(Tuning item) Optimization on the calculation of Decimal data. Keep the default for most scenarios. This tuning item applies only to Spark 3.1.1. If you want to enable this function, perform the following operations:

  1. For Java 9 or later, add -Djdk.attach.allowAttachSelf=true to the Java startup option.
  2. Add spark.executor.extraClassPath=${JAVA_HOME}/lib/* to the spark.conf file.

mf.conf

Parameter

Reference Value

Description

ock.mf.ip_mask

172.17.0.0-172.17.0.125

Set it within the service IP address range of the MF node in the cluster.

ock.mf.ip

127.0.0.1

Loop IP address of the local node. Retain the default value.

ock.mf.port

9999

  • MF port number. Retain the default value.
  • Ensure that this port number and the port number plus 1 are not occupied.

ock.mf.protocol

rc

  • MF protocol.
  • If IB NICs (RDMA) are available, use rc. Otherwise, change the value to tcp.

ock.mf.mem_size

53687091200

  • The MF memory must be at least 50 GB. The default value is 50 GB.
  • If the shared memory quota is limited, the memory used for RC communication plus the MF memory cannot exceed the shared memory quota. Reserve 10 GB memory for RC communication. Unit: bit

ock.mf.water_mark_timer

50

Interval for scanning the memory watermark in the convergent scenario, in milliseconds. Retain the default value.

ock.mf.rpc.timeout

600000

  • Timeout interval for messages between MFs, in milliseconds.
  • Default value: 10min.

ock.ucache.rpc.enableAuthentication

false

Indicates whether to enable the security feature.

  • true: yes
  • false: no

If the three parameters are set to false, you do not need to further set the following parameters.

ock.ucache.rpc.enableTLS

false

ock.ucache.rpc.enableAuthorization

false

ock.ucache.rpc.tls.ca.cert.path

$OCK_HOME/security/tls/server/ca.cert.pem

Path of the ca.cert.pem file (used by OmniShuffle) that is generated on the nodes listed in agent_node_list. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.rpc.tls.cert.path

$OCK_HOME/security/tls/server/server.cert.pem

Path of the server.cert.pem file (used by OmniShuffle) that is generated on the nodes listed in agent_node_list. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.rpc.tls.key.path

$OCK_HOME/security/tls/server/server.private.key.pem

Path of the server.private.key.pem file (used by OmniShuffle) that is generated on the nodes listed in agent_node_list. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.rpc.tls.key.pass.path

$OCK_HOME/security/tls/server/server.keypass.key

Path of the server.keypass.key file (used by OmniShuffle) that is generated on the nodes listed in agent_node_list. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.rpc.tls.crl.path

-

OmniShuffle user CRL.

Path of the OmniShuffle user CRL. If there is no CRL, delete this parameter.

ock.ucache.rpc.tls.key.encrypted

true

Indicates whether the key is encrypted.

  • true: yes
  • false: no

ock.ucache.rpc.auth.type

kerberos

Identity authentication protocol. Currently, the Kerberos protocol is used.

ock.ucache.rpc.auth.kerb.client.keytab

/home/Sparkadmin/huawei/ock/security/kdc/krb5-client.keytab

  • Path of the krb5-client.keytab file (for the user who submits Spark tasks) distributed by the KDC server to each node. Change /home/Sparkadmin to the actual installation path.
  • If ock.ucache.rpc.tls.key.encrypted is set to true, change krb5-client.keytab to krb5-client_en.keytab.

ock.ucache.rpc.auth.kerb.server.keytab

$OCK_HOME/security/kdc/krb5-server.keytab

  • Path of the krb5-server.keytab file (used by OmniShuffle) distributed by the KDC server to each node.
  • Change $OCK_HOME to the actual OmniShuffle installation path.
  • If ock.ucache.rpc.tls.key.encrypted is set to true, change krb5-server.keytab to krb5-server_en.keytab.

ock.ucache.rpc.auth.kerb.keytab.encrypted

true

Set this parameter to true if the keytab file is encrypted and to false if the keytab file is not encrypted.

ock.ucache.rpc.auth.domain

EXAMPLE.COM

Change the value to the domain name specified by the KDC server.

ock.ucache.rpc.auth.server.principle.name

ock_server

Principal name of the OmniShuffle server. Currently, this parameter is set to ock_server.

ock.ucache.rpc.auth.client.principle.name

ock_client

Principal name of the OmniShuffle client. Currently, this parameter is set to ock_client.

ock.ucache.rpc.author.type

whitelist

The default value whitelist is used.

ock.ucache.rpc.author.file.path

$OCK_HOME/security/authorization/whitelist

  • Path of whitelist generated during KDC configuration.
  • Change $OCK_HOME to the actual OmniShuffle installation path.
  • When ock.ucache.rpc.author.file.encrypted is set to true, change whitelist to whitelist_en.

ock.ucache.rpc.author.file.encrypted

true

Set this parameter to true if the whitelist file is encrypted and to false if the whitelist file is not encrypted.

ock.ucache.kmc.ksf.primary.path

$OCK_HOME/security/pmt/master/ksfa

Path of the kmc.primary.ks file generated by using kmc_tool (for the OmniShuffle user). Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.kmc.ksf.standby.path

$OCK_HOME/security/pmt/standby/ksfb

Path of the kmc.standby.ks file generated by using kmc_tool (for the OCK user). Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.kmc.ksf.backup.path

$OCK_HOME/security/pmt/kmcbakup

Path of backups of the kmc.primary.ks and kmc.standby.ks files (for the OCK user). Change $OCK_HOME to the actual OmniShuffle installation path. You can back up the files to a customized path.

ock.ucache.sdk.kmc.ksf.primary.path

/home/Sparkadmin/huawei/ock/security/pmt/master/ksfa

Path of the kmc.primary.ks file generated by using kmc_tool (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path.

ock.ucache.sdk.kmc.ksf.standby.path

/home/Sparkadmin/huawei/ock/security/pmt/standby/ksfb

Path of the kmc.standby.ks file generated by using kmc_tool (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path.

ock.ucache.sdk.kmc.ksf.backup.path

/home/Sparkadmin/huawei/ock/security/pmt/kmcbakup

Path of backups of the kmc.primary.ks and kmc.standby.ks files (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path. You can back up the files to a customized path.

ock.mf.capacity.report.period

Value range: [100, 180000]

Interval for the MF to update the latest capacity information recorded in ZooKeeper. Unit: ms

ock.ucache.server.isIsolated

true

Indicates whether to enable multi-tenant check. This feature is enabled by default. Retain the default value.

  • true: yes
  • false: no

ock.ucache.worker.thread.groups

1,1

If this parameter is set to 1,1, multiple links can be established between MF servers in TCP scenarios to improve performance. This function is disabled by default.

ock.ucache.sdk.thread.groups

1

If this parameter is set to ≥ 1, multiple links can be established between OCK clients and MF servers to improve performance. This function is disabled by default.

ock.ucache.rpc.client.auth.timeout

[15000, 180000]

RPC link setup timeout period, in milliseconds.

ock.ucache.rpc.tls.sdk.crl.path

-

  • CRL of the user who submits Spark tasks.
  • Path of the OmniShuffle user CRL. If there is no CRL, delete this parameter.

ock.ucache.rpc.tls.sdk.ca.cert.path

/home/Sparkadmin/huawei/ock/security/tls/ca.cert.pem

Path of the ca.cert.pem file (for the user who submits Spark tasks) that is generated on the nodes listed in agent_node_list. Change /home/Sparkadmin to the actual installation path.

ock.hswap.path

${OCK_HOME}/hswappath

Swap path.

ock.hswap.queue.cap.per.path

65535

Capacity of the swap queue in each path.

ock.hswap.task.pool.size

65535

Thread pool size.

ock.hswap.max.aio.count.per.thread

65535

Maximum number of AIO events that can be concurrently processed by each thread.

ock.hswap.media.type

0

Drive type. Only one drive type is supported, that is, 0 (meaning NVMe).

ock.conf

The unit of the following parameters containing "timeout" is ms. You can increase the values of these parameters if the network condition is poor. The port number ranges from 3000 to 65535.

Parameter

Reference Value

Description

ock.log.dir

${OCK_HOME}/logs/

OmniShuffle run log directory. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.workers.dir

${OCK_HOME}/conf/workers

Host name directory of the running worker node. Generally, the directory is the same as that of the Hadoop worker node. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.log.level

INFO

Run log level. Retain the default value.

ock.log.fileSize

20

Size of a single run log file, in MB. The value ranges from 1 to 20.

ock.log.rotation.file.num

10

Maximum number of run logs that can be wrapped. If the number of run logs exceeds this value, the excess run logs are deleted. The value ranges from 1 to 20.

ock.kv.levelDB.path

/home/ockadmin/opt/ock/leveldb

Directory for storing levelDB. Retain the default value.

ock.kv.levelDB.write_buffer_size

67108864

Maximum size of a memtable in levelDB, which is used for performance optimization.

ock.kv.levelDB.max_file_size

2097152

Maximum size of a file in levelDB, which is used for performance optimization.

ock.ucache.enabled

true

Indicates whether the Shuffle service is available.

ock.ucache.rpc.shuffle_server.timeout

150000

RPC timeout interval of the shuffle server, in milliseconds.

ock.ucache.rpc.shuffle_meta.timeout

120000

RPC timeout interval of the metadata service, in milliseconds.

ock.ucache.rpc.client.auth.timeout

60000

Timeout interval for RPC connection setup of a node, in milliseconds.

ock.ucache.rpc.local_blob.get.timeout

150000

RPC timeout interval (in milliseconds) of the get LocalBlob operation, which has a higher priority than the default timeout interval.

ock.ucache.rpc.local_blob.commit.timeout

150000

RPC timeout interval (in milliseconds) of the commit LocalBlob operation, which has a higher priority than the default timeout interval.

ock.ucache.rpc.transport.tcp.port.range

60000-61000

Range of extra ports that need to be occupied by the TCP network protocol.

ock.ucache.rpc.transport.protocol

rc

If IB NICs (RDMA) are available, use rc. Otherwise, change the value to tcp.

ock.ucache.rpc.transport.devices

None

Name of the NIC to be used. If multiple NICs are running in the environment, you need to specify the NIC to be used. Otherwise, the communication between nodes may fail.

ock.ucache.shuffle.profile.level

0

Performance statistics collection level.

ock.ucache.rpc.shuffle_meta.worker.thread.group

1,3,3

Number of threads in the thread pool of the metadata node.

ock.ucache.rpc.shuffle_meta.worker.thread.cpuset

None

Used for binding the communication working thread to a CPU core on the metadata node.

ock.ucache.rpc.shuffle_meta.eventMgr.thread.cpuset

None

Used for binding the communication event management thread to a CPU core on the metadata node.

ock.ucache.shuffle.smartGatherData

true

Indicates whether to enable the Spark gather mode.

ock.ucx.tcp.keepintvl

60s

Duration of a TCP connection, in seconds. You can increase the parameter value when the network condition is extremely poor.

ock.ucache.rpc.shuffle_server.port

3891

Service port of the shuffle server. You can specify a port within the port configuration range.

ock.ucache.rpc.shuffle_meta.port

3892

Service port of the shuffle meta. You can specify a port within the port configuration range.

ock.ucache.rpc.manager.port

3899

Service port of the shuffle client.

ock.ucache.meta.node_lists

127.0.0.1

Set meta ip to 127.0.0.X, which is the service IP address of the management node in the cluster. Use commas (,) to separate multiple IP addresses. Example: 127.0.0.1,127.0.0.2

ock.ucache.server.max_local_blob_capacity

25769803776

Maximum local_local capacity, in bits. You can set this parameter to a value greater than or equal to 24 GB and less than half of the MF memory capacity.

Currently, the reference value is 24 GB.

ock.ucache.server.data.isolation

true

Indicates whether to enable the app resource isolation function of OmniShuffle. This parameter must be used together with the client parameters.

ock.zookeeper.server.url

127.0.0.1:2181

IP address and port number of the ZooKeeper server.

  • If only Kerberos is enabled, set the port number to 2181.
  • If TLS+Kerberos is enabled, set the port number to 2281.

ock.zookeeper.session.timeout

30000

Timeout interval for connecting to the ZooKeeper session.

ock.zookeeper.connect.timeout

30

Timeout interval for ZooKeeper connection attempts, in seconds. If there are a large number of nodes and the connection delay is long, you can increase the value of this parameter.

ock.ucache.server.swap.threshold.higher_watermark

60

Memory watermark for swapping read-only ShuffleBlobs. The value ranges from 0 to 100. You are not advised to change the value.

ock.ucache.server.swap.threshold.lower_watermark

20

Memory watermark for swapping ShuffleBlobs with only external storage to the memory pool. The value ranges from 0 to 100. You are not advised to change the value.

ock.ucache.server.swap.threshold.free_water_mater

80

When the MF memory usage exceeds the preset value, the system prepares to release the swapped memory occupied by ShuffleBlob. The value is greater than 0 and smaller than 100. You are not advised to change the value.

ock.ucache.server.swap.path

-

File directory to be swapped to the external storage. Use commas (,) to separate multiple directories. This field is mandatory. If not specified, the task cannot be started. You are advised to set the permission to 750.

ock.ucache.rpc.enableAuthentication

true

Indicates whether to enable the security feature.

  • true: yes
  • false: no

ock.ucache.rpc.enableTLS

true

Indicates whether to enable transmission encryption.

ock.ucache.rpc.enableAuthorization

true

Indicates whether to enable login authentication.

ock.ucache.rpc.tls.ca.cert.path

${OCK_HOME}/security/tls/server/ca.cert.pem

Path of the ca.cert.pem file (for the OCK user) that is generated on the nodes listed in agent_node_list during certificate distribution. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.rpc.tls.key.encrypted

true

Indicates whether the key is encrypted. This parameter is valid only when the security feature is enabled.

  • true: yes
  • false: no

ock.ucache.rpc.tls.cert.path

${OCK_HOME}/security/tls/server/server.private.key.pem

Path of the server.private.key.pem file (for the OCK user) that is generated on the nodes listed in agent_node_list during certificate distribution. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.rpc.tls.key.path

${OCK_HOME}/security/tls/server/server.cert.pem

Path of the agent.private.key.pem file (for the OCK user) that is generated on the nodes listed in agent_node_list during certificate distribution. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.rpc.tls.key.pass.path

${OCK_HOME}/security/tls/server/server.keypass.key

Path of the server.keypass.key file (for the OCK user) that is generated on the nodes listed in agent_node_list during certificate distribution. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.rpc.tls.crl.path

-

  • CRL used by the OCK user.
  • If there is no CRL, leave it blank.

ock.ucache.rpc.auth.type

kerberos

Identity authentication protocol. Currently, the Kerberos protocol is used.

ock.ucache.rpc.auth.kerb.client.keytab

${OCK_HOME}/security/authentication/krb5-client.keytab

  • Path of the krb5-client.keytab file (for the user who submits Spark tasks) distributed by the KDC server to each node. Change $OCK_HOME to the actual OmniShuffle installation path.
  • If ock.ucache.rpc.tls.key.encrypted is set to true, change krb5-client.keytab to krb5-client_en.keytab.

ock.ucache.rpc.auth.kerb.server.keytab

${OCK_HOME}/security/kdc/krb5-server.keytab

  • Path of the krb5-server.keytab file (for the OCK user) distributed by the KDC server to each node. Change $OCK_HOME to the actual OmniShuffle installation path.
  • If ock.ucache.rpc.tls.key.encrypted is set to true, change krb5-server.keytab to krb5-server_en.keytab.

ock.ucache.rpc.auth.kerb.keytab.encrypted

true

Indicates whether the keytab file is encrypted. This parameter is valid only when the security feature is enabled.

  • true: yes
  • false: no

ock.ucache.rpc.auth.domain

EXAMPLE.COM

Domain name specified by the KDC server.

ock.ucache.rpc.auth.server.principle.name

ock_server

Principal name of the OmniShuffle server. Currently, this parameter is set to ock_server.

ock.ucache.rpc.auth.client.principle.name

ock_client

Principal name of the OmniShuffle client. Currently, this parameter is set to ock_client.

ock.ucache.rpc.auth.meta.principle.mapping

127.0.0.1:hostname

The value is the same as the IP address in ock.ucache.meta.node_lists. Use commas (,) to separate multiple IP addresses, for example, 127.0.0.1:hostname1,127.0.0.2:hostname2.

ock.ucache.rpc.author.type

whitelist

The default value whitelist is used.

ock.ucache.rpc.author.file.path

${OCK_HOME}/security/authorization/whitelist

  • Path of whitelist generated during KDC configuration. Change $OCK_HOME to the actual OmniShuffle installation path.
  • When ock.ucache.rpc.author.file.encrypted is set to true, change whitelist to whitelist_en.

ock.ucache.rpc.author.file.encrypted

true

Indicates whether the whitelist file is encrypted.

  • true: yes
  • false: no

ock.daemon.expireChecker.period

86400

Security certificate check interval, in seconds.

ock.ucache.kmc.ksf.primary.path

${OCK_HOME}/security/pmt/master/ksfa

Path of the kmc.primary.ks file generated by using kmc_tool (for the OCK user). Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.kmc.ksf.standby.path

${OCK_HOME}/security/pmt/standby/ksfb

Path of the kmc.standby.ks file generated by using kmc_tool (for the OCK user). Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.kmc.ksf.backup.path

${OCK_HOME}/security/pmt/kmcbakup

Path of backups of the kmc.primary.ks and kmc.standby.ks files (for the OCK user). Change $OCK_HOME to the actual OmniShuffle installation path. You can back up the files to a customized path.

ock.zookeeper.security.principle.name

zookeeper

Principle name of the Kerberos authentication server, indicating the first part of the principle.

ock.zookeeper.security.principle.hostname

master

Principle name of the ZooKeeper server for Kerberos authentication, indicating the second part of the principle.

ock.zookeeper.security.strategy

GSSAPI

Kerberos authentication mechanism supported by SASL. Retain the default value GSSAPI.

ock.zookeeper.security.enable

true

Indicates whether to enable ZooKeeper encryption.

  • true: yes. In this case, all ZooKeeper security-related parameters need to be set.
  • false: no

ock.zookeeper.security.isKeytabEncrypt

true

Indicates whether client.keytab is encrypted.

  • true: yes
  • false: no

ock.zookeeper.security.certs

/home/ockadmin/opt/ock/security/tls/server.crt.pem,/home/ockadmin/opt/ock/security/tls/client.crt.pem,***

  • When TLS+Kerberos is enabled, set this parameter to the certificates required by TLS (for the OCK user), including server.crt.pem, client.crt.pem, client.pem, and the PEM certificate password encrypted using KMC.
  • When only TLS is enabled, set this parameter to false.

ock.zookeeper.security.client.principle

zkcli/master@HUAWEI.COM

Principle for Kerberos authentication on the ZooKeeper client (for the OCK user). master indicates the host name of the node, and HUAWEI.COM indicates the domain name of KDC.

ock.zookeeper.security.client.keytab

${OCK_HOME}/security/kdc/krb5-server_en.keytab

Path of the keytab file for Kerberos authentication on the ZooKeeper client (for the OCK user). Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.broadcast.variable.create.timeout

600000

Timeout interval for creating a broadcast variable, in milliseconds. The value -1 indicates that there is no timeout limit.

ock.ucache.broadcast.variable.fetch.timeout

600000

Timeout interval for fetching a broadcast variable, in milliseconds. The value -1 indicates that there is no timeout limit.

ock.ucache.broadcast.bt.percent

10

Percentage of the number of BT servers to the number of nodes in the cluster during the process of fetching broadcast variables. The value ranges from 1 to 100.

ock.ucache.rpc.transport.ipfilter

-

Select a communication device name based on the network segment to which the node belongs, for example, 192.168.100.194/24<,192.168.200.194/24>. Separate multiple network segments with commas (,). You can run the ip a command to view the network segment information. It is recommended that the nodes be configured in a unified manner.

ock.ucache.rpc.transport.devices.path

/sys/class/infiniband/

Directory for storing RC NIC information. Generally, the default value is used.

ock.ucache.rpc.openssl.path

${OCK_HOME}/ucache/23.0.0/linux-aarch64/lib/common/openssl/libssl.so

Path for loading the OpenSSL SO file on which OmniShuffle depends. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.rpc.crypto.path

${OCK_HOME}/ucache/23.0.0/linux-aarch64/lib/common/openssl/libcrypto.so

Path for loading the crypto SO file on which OmniShuffle depends. Change $OCK_HOME to the actual OmniShuffle installation path.

ock.ucache.rpc.tls.sdk.ca.cert.path

/home/Sparkadmin/huawei/ock/security/tls/ca.cert.pem

Path of the ca.cert.pem file (for the user who submits Spark tasks) that is generated on the nodes listed in agent_node_list during certificate distribution. Change /home/Sparkadmin to the actual installation path.

ock.ucache.rpc.tls.sdk.crl.path

-

  • CRL used by the user who submits Spark tasks.
  • If there is no CRL, leave it blank.

ock.ucache.sdk.kmc.ksf.primary.path

/home/Sparkadmin/huawei/ock/security/pmt/master/ksfa

Path of the kmc.primary.ks file generated by using kmc_tool (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path.

ock.ucache.sdk.kmc.ksf.standby.path

/home/Sparkadmin/huawei/ock/security/pmt/standby/ksfb

Path of the kmc.standby.ks file generated by using kmc_tool (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path.

ock.ucache.sdk.kmc.ksf.backup.path

/home/Sparkadmin/huawei/ock/security/pmt/kmcbakup

Path of backups of the kmc.primary.ks and kmc.standby.ks files (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path. You can back up the files to a customized path.

ock.zookeeper.sdk.security.certs

/home/Sparkadmin/huawei/ock/security/tls/server.crt.pem,/home/Sparkadmin/huawei/ock/security/tls/client.crt.pem,/home/Sparkadmin/huawei/ock/security/tls/client.pem,***

  • When TLS+Kerberos is enabled, set this parameter to the certificates required by TLS (for the user who submits Spark tasks), including server.crt.pem, client.crt.pem, client.pem, and the PEM certificate password encrypted using KMC.
  • When only TLS is enabled, set this parameter to false.

ock.zookeeper.sdk.security.client.principle

zkcli/master@HUAWEI.COM

Principle for Kerberos authentication on the ZooKeeper client (for the user who submits Spark tasks). master indicates the host name of the node, and HUAWEI.COM indicates the domain name of KDC.

ock.zookeeper.sdk.security.client.keytab

/home/Sparkadmin/huawei/ock/security/kdc/krb5-client_en.keytab

Path of the keytab file for Kerberos authentication on the ZooKeeper client (for the user who submits Spark tasks).

ock.daemon.expireChecker.lead

-

Threshold for certificate expiration notification. If this parameter is not set, the notification is triggered seven days before the certificate expires. The value ranges from 7 to 180.

ock.ucache.server.aggregator.core.thread.num

4

Number of aggregation core threads. The value ranges from 1 to the maximum number of cores on the device.

ock.tuning.enabled

true

Indicates whether the BoostTuning service is available.

ock.tuning.history.persist.type

local

Persistence mode of historical data.

  • local: local CSV file
  • kv: levelDB storage

ock.tuning.history.persist.path

${OCK_HOME}/history

Path for storing historical data.

ock.tuning.history.file.max.number

5

Maximum number of files that store historical data. This parameter is valid only in local mode. The value ranges from 1 to 5.

ock.tuning.history.file.read.number

2

Number of files whose historical data is read. This parameter is valid only in local mode. The latest file is read each time. The value is 1 or 2.

ock.tuning.rpc.server.port

3893

RPC service port.

ock.tuning.rpc.timeout

240000

Timeout interval of the RPC service, in milliseconds.

ock-start-ockd-by-yarn.sh

Parameter

Reference Value

Description

retry_times

5

Number of times that Yarn attempts to start the OCKD process.

interval_time

150

Interval at which Yarn attempts to start the OCKD process, in seconds.

forever_interval_time

600

Interval at which Yarn attempts to start the OCKD process after retry_times start failures of the OCKD process, in seconds.

agent_node_list

The file content format is as follows:

IP_address O&M account

If there are multiple nodes, enter one IP address and one O&M account in each line. Note that all nodes must be covered.

The file content is as follows:
1.1.1.1 root
1.1.1.3 root
1.1.1.5 root
1.1.1.7 root

CA_node_list

The file content format is as follows:

IP_address O&M account

If there are multiple nodes, enter one IP address and one O&M account in each line. Only information about the management node is required.

The file content is as follows:
1.1.1.9 BigDataAdmin

ock-launch-cluster.sh

Parameter

Reference Value

Description

ock_vcore

15

Number of CPUs used by OmniShuffle.

ock_memory

61440

Memory size occupied by OmniShuffle, in MB. Use the larger value between 110% of the MF memory and the sum of the MF memory and 10 GB. The value includes the memory for running OCK. The unit is MB.

master_vcore

5

Number of CPUs occupied by the launched primary node.

master_memory

10240

Memory size occupied by the launched primary node, in MB.

queue

-

Yarn queue where OmniShuffle resides.

ock_master_partition_label

-

Label of the Yarn partition where the launched primary node is located.

need_kerberos

-

Indicates whether Kerberos authentication is required before a job is submitted.

kerberos_conf

-

Path of the krb5.conf configuration file for Kerberos authentication. This parameter is valid only when need_kerberos is set to true.

kerberos_user

-

User name for Kerberos authentication. This parameter is valid only when need_kerberos is set to true.

kerberos_key_table

-

Path of the keytable file corresponding to the user name for Kerberos authentication. This parameter is valid only when need_kerberos is set to true.

local_dir

$(cd "$(dirname $0)"||exit 0; pwd)

Current directory.

ock_home

$(cd "$(dirname $0)"/../../../..||exit 0; pwd)

OmniShuffle deployment path.

ock_version_dir

$(cd "$(dirname $0)"/../..||exit 0; pwd)

Path for storing OmniShuffle version information.

ock_version

"${ock_version_dir##*/}"

OmniShuffle version.

ock_run_shell_path

"${local_dir}/ock-start-ockd-by-yarn.sh"

Path of the script for Yarn to start OmniShuffle.

ock_nodes_list_path

"${ock_home}/conf/ock_node_list"

Path of the OmniShuffle node list configuration file.

client_jar_path

"${ock_home}/jars/ock-launch-cluster-${ock_version}.jar"

Path of the JAR file used by Yarn to start OmniShuffle.

log_path

"${ock_home}/logs/ock-launch-cluster.log"

Path of the log file used by Yarn to start OmniShuffle.

appid_path

"${ock_home}/work/yarn-appids/yarn-ock.appid"

Path of the .appid file used by Yarn to start OmniShuffle.

ock-stop-cluster.sh

Parameter

Reference Value

Description

ock_home

"$(cd "$(dirname $0)"/../../../..||exit ${EXT}; pwd)"

OmniShuffle deployment path.

appid_path

"${ock_home}/work/yarn-appids/yarn-ock.appid"

Path of the .appid file of OmniShuffle stopped by Yarn.

log_path

"${OCK_HOME}/logs/ock-stop-cluster.log"

Path of the log file used by Yarn to stop OmniShuffle. Change $OCK_HOME to the actual OmniShuffle installation path.

ock_id

$(cat ${appid_path}|grep -Eo "application_[0-9]+_[0-9]+")

Application ID of OCK in Yarn.