OmniShuffle Configuration File
spark.conf
Parameter |
Value Range and Default Value |
Description |
|---|---|---|
spark.executor.extraClassPath |
$OCK_HOME/jars/*:. |
Path of the OmniShuffle JAR package. Change $OCK_HOME to the actual OmniShuffle installation path. |
spark.driver.extraJavaOptions |
-Djava.library.path=$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/common/openssl:$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/common:$OCK_HOME/ock/ucache/24.0.0/linux-aarch64/lib/datakit:$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/mf -Dlog4j.configuration=/usr/local/spark/conf/log4j.properties -XX:+UseParallelGC |
JVM option string transferred to the driver. Change $OCK_HOME to the actual OmniShuffle installation path. |
spark.executor.extraJavaOptions |
-Djava.library.path=$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/common/openssl:$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/common:$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/datakit:$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/mf -Xms8g -XX:+UseParallelGC -XX:ParallelGCThreads=6 -XX:ErrorFile=/tmp/hs_err_pid%p.log |
JVM option string transferred to the executor. Change $OCK_HOME to the actual OmniShuffle installation path. |
spark.driver.extraLibraryPath |
$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/common/openssl:$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/common:$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/datakit:$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/mf:. |
Path of the library used when the JVM of the driver is started. Change $OCK_HOME to the actual OmniShuffle installation path. |
spark.executor.extraLibraryPath |
$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/common/openssl:$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/common:$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/datakit:$OCK_HOME/ucache/24.0.0/linux-aarch64/lib/mf:. |
Path of the library used when the JVM of the executor is started. Change $OCK_HOME to the actual OmniShuffle installation path. |
spark.shuffle.manager |
Class path of OCK Shuffle Manager. |
|
spark.blacklist.enabled |
|
This parameter is provided by Spark. Set this parameter to true at the job level to enable the blocklist mechanism for fault recovery. |
spark.blacklist.application.fetchFailure.enabled |
|
This parameter is provided by Spark. Set this parameter to true at the job level so that Spark will blocklist the executor immediately when a fetch failure occurs. |
spark.files.fetchFailure.unRegisterOutputOnHost |
|
This parameter is provided by Spark. Set this parameter to true at the job level so that Spark unregisters outputs of existing map tasks when a fetch failure occurs. |
spark.yarn.blacklist.executor.launch.blacklisting.enabled |
|
This parameter is provided by Spark for Yarn. Set this parameter to true at the job level to enable blocklisting of nodes having YARN resource allocation problems. |
spark.shuffle.service.enabled |
|
This parameter is provided by Spark. Set this parameter to false at the job level to disable the Spark external shuffle service. |
spark.shuffle.isMapSideCombineExt |
|
Indicates whether to use the aggregator to aggregate data. |
spark.shuffle.ock.home |
|
Location of the home folder for OmniShuffle. |
spark.shuffle.ock.version |
|
OmniShuffle version. |
spark.shuffle.ock.binaryType |
|
OmniShuffle software architecture type. |
spark.shuffle.ock.deploy.isStandalone |
|
Indicates whether the Spark cluster uses the standalone architecture. |
spark.shuffle.ock.mapTaskOutput.minCapacityTotal |
41943040 |
Minimum size of the mapTask output buffer, in bytes. |
spark.shuffle.ock.mapTaskOutput.maxCapacityTotal |
134217728 |
Maximum size of the mapTask output buffer, in bytes. |
spark.shuffle.ock.mfLocalMemCap |
1073741824 |
Startup size of the memory fabric (MF) client on the SDK side. |
spark.shuffle.ock.isIsolated |
|
Indicates whether to enable the app resource isolation function of OmniShuffle. This parameter must be used together with the OmniShuffle server parameters. |
spark.shuffle.ock.scheduler.excludeUnavailableNodes |
|
Indicates whether to enable blocklisting of invalid nodes for Shuffle Manager. |
spark.shuffle.ock.removeShuffleDataAfterJobFinished |
|
(Tuning item) Indicates whether to release the shuffle file after a job is complete. In most scenarios, set this parameter to false. Set this parameter to true only when you confirm that shuffle data is not reused across jobs. |
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version |
2 |
This is a native Hadoop configuration that is used to optimize performance and reduce the time required for shuffle file output. |
spark.shuffle.ock.aggregateFlags |
|
Indicates whether to perform aggregation. |
spark.broadcast.ock.manager |
|
Indicates whether to enable OmniShuffle broadcast. |
spark.broadcast.ock.robustness |
|
Indicates whether to enable OmniShuffle broadcast reliability.
|
spark.broadcast.ock.ockThresholdInMb |
Default value: 100 |
Threshold for the broadcast variable type, in MB. When the broadcast variable exceeds this threshold and spark.broadcast.ock.manager is set to true, OmniShuffle broadcast variables are used. Otherwise, native broadcast variables are used. |
spark.sql.adaptive.enabled |
|
Indicates whether to enable the native AQE function of Spark. Currently, OmniShuffle BoostTuning works only on Spark SQL jobs for which AQE is enabled. Set this parameter to true. |
spark.ock.decimal.optimize |
|
(Tuning item) Optimization on the calculation of Decimal data. Keep the default for most scenarios. This tuning item applies only to Spark 3.1.1. If you want to enable this function, perform the following operations:
|
spark.shuffle.ock.rss.stopRepStageComplete |
|
Indicates whether to stop the backup of a stage after the stage ends. |
spark.shuffle.ock.rss.write.sendBuffer |
24 |
Number of SendBuffers cached locally. |
spark.shuffle.ock.rss.lb.strategy |
|
Load balancing policy.
|
spark.shuffle.ock.rss.lb.initRSSNum |
|
This parameter is valid only when the load balancing policy is BalancedByExtendStrategy. |
spark.shuffle.ock.rss.enableReplication |
|
Indicates whether to enable the replica mode. This mode cannot be enabled together with the performance mode. |
spark.shuffle.ock.rss.syncRep |
|
Indicates whether to enable the synchronous/asynchronous replica mode. The replica mode must be enabled first. |
spark.shuffle.ock.isPrefetchMode |
|
Indicates whether to enable the performance mode. This mode cannot be enabled together with the replica mode. |
spark.shuffle.ock.mode |
|
OmniShuffle deployment mode. |
mf.conf
Parameter |
Reference Value |
Description |
|---|---|---|
ock.mf.ip_mask |
172.17.0.0–172.17.0.125 |
Set it within the service IP address range of the MF node in the cluster. It does not contain the management node IP address. |
ock.mf.port |
9999 |
|
ock.mf.protocol |
rc |
|
ock.mf.mem_size |
53687091200 |
|
ock.mf.mempoolsize |
268,435,456 |
Size of the local memory pool on the client, in bytes. |
ock.mf.rpc.thread.num |
128 |
Number of threads in the thread pool for processing cross-node messages between MFs. The value ranges from 1 to 128. |
ock.mf.water_mark_timer |
50 |
Interval for scanning the memory watermark in the convergent scenario, in milliseconds. Retain the default value. |
ock.mf.rpc.timeout |
600000 |
|
ock.mf.rpc.rndv_rtr_timeout |
30000 |
Timeout interval between the rts state and the rtr state of the sender during UCX RNDV communication, in milliseconds. |
ock.ucache.rpc.enableAuthentication |
true |
Indicates whether to enable the security feature.
This security item must be enabled by default. Disabling it may cause security risks. If the three parameters are set to false, you do not need to further set the following parameters. |
ock.ucache.rpc.enableTLS |
true |
|
ock.ucache.rpc.enableAuthorization |
true |
|
ock.ucache.rpc.tls.ca.cert.path |
$OCK_HOME/security/tls/server/ca.cert.pem |
Path of the ca.cert.pem file (used by OmniShuffle) that is generated on the nodes listed in agent_node_list. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.rpc.tls.cert.path |
$OCK_HOME/security/tls/server/server.cert.pem |
Path of the server.cert.pem file (used by OmniShuffle) that is generated on the nodes listed in agent_node_list. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.rpc.tls.key.path |
$OCK_HOME/security/tls/server/server.private.key.pem |
Path of the server.private.key.pem file (used by OmniShuffle) that is generated on the nodes listed in agent_node_list. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.rpc.tls.key.pass.path |
$OCK_HOME/security/tls/server/server.keypass.key |
Path of the server.keypass.key file (used by OmniShuffle) that is generated on the nodes listed in agent_node_list. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.rpc.tls.crl.path |
- |
OmniShuffle user certificate revocation list (CRL). If there is no user CRL path, delete this parameter. |
ock.ucache.rpc.auth.type |
kerberos |
Identity authentication protocol. Currently, the Kerberos protocol is used. |
ock.ucache.rpc.auth.kerb.client.keytab |
/home/Sparkadmin/huawei/ock/security/kdc/krb5-client_en.keytab |
|
ock.ucache.rpc.auth.kerb.server.keytab |
$OCK_HOME/security/kdc/krb5-server_en.keytab |
|
ock.ucache.rpc.auth.domain |
EXAMPLE.COM |
Change the value to the domain name specified by the KDC server. |
ock.ucache.rpc.auth.server.principle.name |
ock_server |
Principal name of the OmniShuffle server. Currently, this parameter is set to ock_server. |
ock.ucache.rpc.auth.client.principle.name |
ock_client |
Principal name of the OmniShuffle client. Currently, this parameter is set to ock_client. |
ock.ucache.rpc.author.type |
whitelist |
The default value whitelist is used. |
ock.ucache.rpc.author.file.path |
$OCK_HOME/security/authorization/whitelist |
|
ock.ucache.kmc.ksf.primary.path |
$OCK_HOME/security/pmt/master/ksfa |
Path of the kmc.primary.ks file generated by using kmc_tool (for the OmniShuffle user). Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.kmc.ksf.standby.path |
$OCK_HOME/security/pmt/standby/ksfb |
Path of the kmc.standby.ks file generated by using kmc_tool (for the OCK user). Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.kmc.ksf.backup.path |
$OCK_HOME/security/pmt/kmcbakup |
Path of backups of the kmc.primary.ks and kmc.standby.ks files (for the OCK user). Change $OCK_HOME to the actual OmniShuffle installation path. You can back up the files to a customized path. |
ock.ucache.sdk.kmc.ksf.primary.path |
/home/Sparkadmin/huawei/ock/security/pmt/master/ksfa |
Path of the kmc.primary.ks file generated by using kmc_tool (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path. |
ock.ucache.sdk.kmc.ksf.standby.path |
/home/Sparkadmin/huawei/ock/security/pmt/standby/ksfb |
Path of the kmc.standby.ks file generated by using kmc_tool (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path. |
ock.ucache.sdk.kmc.ksf.backup.path |
/home/Sparkadmin/huawei/ock/security/pmt/kmcbakup |
Path of backups of the kmc.primary.ks and kmc.standby.ks files (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path. You can back up the files to a customized path. |
ock.ucache.rpc.tls.sdk.cert.path |
${SPARKADMIN_HOME}/security/certs/server.cert.pem |
Path of the agent.private.key.pem file (for the user who submits Spark tasks) that is generated on the nodes listed in agent_node_list. The $SPARKADMIN_HOME path indicates the path for storing the Spark task user's security files. Replace it with the actual one. |
ock.ucache.rpc.tls.sdk.key.path |
${SPARKADMIN_HOME}/security/certs/server.private.key.pem |
Path of the server.private.key.pem file (for the user who submits Spark tasks) that is generated on the nodes listed in agent_node_list. The $SPARKADMIN_HOME path indicates the path for storing the Spark task user's security files. Replace it with the actual one. |
ock.ucache.rpc.tls.sdk.key.pass.path |
${SPARKADMIN_HOME}/security/certs/server.keypass.key |
Path of the server.keypass.key file (for the user who submits Spark tasks) that is generated on the nodes listed in agent_node_list. The $SPARKADMIN_HOME path indicates the path for storing the Spark task user's security files. Replace it with the actual one. |
ock.mf.capacity.report.period |
Value range: [100, 180000] |
Interval for the MF to update the latest capacity information recorded in ZooKeeper. Unit: ms |
ock.ucache.server.isIsolated |
true |
Indicates whether to enable multi-tenant check. This feature is enabled by default. Retain the default value.
|
ock.ucache.worker.thread.groups |
1,1 |
If this parameter is set to 1,1, multiple links can be established between MF servers in TCP scenarios to improve performance. This function is disabled by default. |
ock.ucache.sdk.thread.groups |
1 |
If this parameter is set to ≥ 1, multiple links can be established between OCK clients and MF servers to improve performance. This function is disabled by default. |
ock.ucache.rpc.client.auth.timeout |
[15000, 180000] |
RPC link setup timeout duration, in milliseconds. |
ock.ucache.rpc.tls.sdk.crl.path |
- |
CRL used by the user who submits Spark tasks. If there is no user CRL path, delete this parameter. |
ock.ucache.rpc.tls.sdk.ca.cert.path |
/home/Sparkadmin/huawei/ock/security/tls/ca.cert.pem |
Path of the ca.cert.pem file (for the user who submits Spark tasks) that is generated on the nodes listed in agent_node_list. Change /home/Sparkadmin to the actual installation path. |
ock.hswap.path |
${OCK_HOME}/hswappath |
Swap path. |
ock.hswap.queue.cap.per.path |
65535 |
Capacity of the swap queue in each path. |
ock.hswap.task.pool.size |
65535 |
Thread pool size. |
ock.hswap.max.aio.count.per.thread |
65535 |
Maximum number of AIO events that can be concurrently processed by each thread. |
ock.hswap.media.type |
0 |
Drive type. Only one drive type is supported, that is, 0 (meaning NVMe). |
ock.conf
For the following parameters containing "timeout", expect those with the explicit unit of ms, you can increase the values of these parameters if the network condition is poor. The port number ranges from 3000 to 65535.
Parameter |
Reference Value |
Description |
|---|---|---|
ock.log.dir |
${OCK_HOME}/logs/ |
OmniShuffle run log directory. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.workers.dir |
${OCK_HOME}/conf/workers |
Host name directory of the running worker node. Generally, the directory is the same as that of the Hadoop worker node. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.log.level |
INFO |
Run log level. Retain the default value. |
ock.log.fileSize |
20 |
Size of a single run log file, in MB. The value ranges from 2 to 20. |
ock.log.rotation.file.num |
20 |
Maximum number of run logs that can be wrapped. If the number of run logs exceeds this value, the excess run logs are deleted. The value ranges from 1 to 20. |
ock.ucache.enabled |
true |
Indicates whether the Shuffle service is available. |
ock.ucache.replication.service.thread.num |
10 |
Number of threads for sending replica tasks. |
ock.ucache.replication.thread.num |
16 |
Number of threads for executing replica tasks. |
ock.ucache.rpc.shuffle_driver.worker.thread.cpuset |
- |
CPU cores bound to the driver communication threads. |
ock.ucache.rpc.shuffle_driver.worker.thread.group |
8 |
Number of RPC processing threads of the driver. |
ock.ucache.rpc.shuffle_server.worker.thread.cpuset |
- |
CPU cores bound to the RSS communication threads. |
ock.ucache.rpc.shuffle_server.timeout |
60000 |
RPC timeout duration of the shuffle server, in milliseconds. |
ock.ucache.rpc.shuffle_meta.timeout |
60000 |
RPC timeout duration of the metadata service, in milliseconds. |
ock.ucache.rpc.client.auth.timeout |
60000 |
Timeout duration of RPC connection setup of a node, in milliseconds. |
ock.ucache.rpc.local_blob.get.timeout |
60000 |
RPC timeout duration (in milliseconds) of the get LocalBlob operation, which has a higher priority than the default timeout interval. |
ock.ucache.rpc.local_blob.commit.timeout |
60000 |
RPC timeout duration (in milliseconds) of the commit LocalBlob operation, which has a higher priority than the default timeout interval. |
ock.ucache.rpc.transport.tcp.port.range |
60000~61000 |
Range of extra ports that need to be occupied by the TCP network protocol. |
ock.ucache.rpc.transport.protocol |
rc |
If IB NICs (RDMA) are available, use rc. Otherwise, change the value to tcp. |
ock.ucache.rpc.transport.devices |
None |
Name of the NIC to be used. If multiple NICs are running in the environment, you need to specify the NIC to be used. Otherwise, the communication between nodes may fail. |
ock.ucache.shuffle.profile.level |
0 |
Performance statistics collection level. |
ock.ucx.tcp.keepintvl |
120s |
Duration of a TCP connection, in seconds. You can increase the parameter value when the network condition is extremely poor. |
ock.ucache.rpc.shuffle_server.port |
3891 |
Service port of the shuffle server. You can specify a port within the port configuration range. |
ock.ucache.server.max_local_blob_capacity |
25769803776 |
Maximum local_local capacity, in bits. You can set this parameter to a value greater than or equal to 24 GB and less than half of the MF memory capacity. Currently, the reference value is 24 GB. |
ock.ucache.server.data.isolation |
true |
Indicates whether to enable the app resource isolation function of OmniShuffle. This parameter must be used together with the client parameters. |
ock.zookeeper.server.url |
127.0.0.1:2181 |
IP address and port number of the ZooKeeper server.
|
ock.zookeeper.session.timeout |
30000 |
Timeout duration of connecting to the ZooKeeper session. |
ock.zookeeper.connect.timeout |
30 |
Timeout duration of ZooKeeper connection attempts, in seconds. If there are a large number of nodes and the connection delay is long, you can increase the value of this parameter. |
ock.ucache.server.swap.threshold.higher_watermark |
60 |
Memory watermark for swapping read-only ShuffleBlobs. The value ranges from 0 to 100. You are not advised to change the value. |
ock.ucache.server.swap.threshold.lower_watermark |
20 |
Memory watermark for swapping ShuffleBlobs with only external storage to the memory pool. The value ranges from 0 to 100. You are not advised to change the value. |
ock.ucache.server.swap.threshold.free_water_mater |
75 |
When the MF memory usage exceeds the preset value, the system prepares to release the swapped memory occupied by ShuffleBlob. The value ranges from 0 to 100. You are not advised to change the value. |
ock.ucache.server.shuffle.max.receive.buffer.pool.size |
0 |
Maximum number of receive buffers. The maximum value is 20000. If this parameter is set to 0, the value is automatically calculated based on the remaining MF memory capacity. |
ock.ucache.server.swap.path |
- |
File directory to be swapped to the external storage. Use commas (,) to separate multiple directories. This field is mandatory. If not specified, the task cannot be started. You are advised to set the permission to 750. |
ock.ucache.rpc.enableAuthentication |
true |
Indicates whether to enable the security feature.
|
ock.ucache.rpc.enableTLS |
true |
Indicates whether to enable transmission encryption. |
ock.ucache.rpc.enableAuthorization |
true |
Indicates whether to enable login authentication. |
ock.ucache.rpc.tls.ca.cert.path |
${OCK_HOME}/security/tls/server/ca.cert.pem |
Path of the ca.cert.pem file (for the OCK user) that is generated on the nodes listed in agent_node_list during certificate distribution. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.rpc.tls.cert.path |
${OCK_HOME}/security/tls/server/server.private.key.pem |
Path of the server.private.key.pem file (for the OCK user) that is generated on the nodes listed in agent_node_list during certificate distribution. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.rpc.tls.key.path |
${OCK_HOME}/security/tls/server/server.cert.pem |
Path of the agent.private.key.pem file (for the OCK user) that is generated on the nodes listed in agent_node_list during certificate distribution. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.rpc.tls.key.pass.path |
${OCK_HOME}/security/tls/server/server.keypass.key |
Path of the server.keypass.key file (for the OCK user) that is generated on the nodes listed in agent_node_list during certificate distribution. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.rpc.tls.crl.path |
- |
CRL used by the OCK user. If there is no user CRL path, this parameter can be left blank. |
ock.ucache.rpc.tls.driver.key.path |
${SPARKADMIN_HOME}/security/certs/server.private.key.pem |
Path of the server.private.key.pem file (for the user who submits Spark tasks) that is generated on the nodes listed in agent_node_list. The $SPARKADMIN_HOME path indicates the path for storing the Spark task user's security files. Replace it with the actual one. |
ock.ucache.rpc.tls.driver.cert.path |
${SPARKADMIN_HOME}/security/certs/server.cert.pem |
Path of the agent.private.key.pem file (for the user who submits Spark tasks) that is generated on the nodes listed in agent_node_list. The $SPARKADMIN_HOME path indicates the path for storing the Spark task user's security files. Replace it with the actual one. |
ock.ucache.rpc.tls.driver.key.pass.path |
${SPARKADMIN_HOME}/security/certs/server.keypass.key |
Path of the server.keypass.key file (for the user who submits Spark tasks) that is generated on the nodes listed in agent_node_list. The $SPARKADMIN_HOME path indicates the path for storing the Spark task user's security files. Replace it with the actual one. |
ock.ucache.rpc.auth.type |
kerberos |
Identity authentication protocol. Currently, the Kerberos protocol is used. |
ock.ucache.rpc.auth.kerb.client.keytab |
${SPARKADMIN_HOME}/security/kdc/krb5-client_en.keytab |
Path of the krb5-client.keytab file (for the user who submits Spark tasks) distributed by the KDC server to each node. The $SPARKADMIN_HOME path indicates the path for storing the Spark task user's security files. Replace it with the actual one. |
ock.ucache.rpc.auth.kerb.server.keytab |
${OCK_HOME}/security/kdc/krb5-server_en.keytab |
|
ock.ucache.rpc.auth.driver.kerb.server.keytab |
${SPARKADMIN_HOME}/security/kdc/krb5-server_en.keytab |
Path of the krb5-server.keytab file (for the user who submits Spark tasks) distributed by the KDC server to each node. The $SPARKADMIN_HOME path indicates the path for storing the Spark task user's security files. Replace it with the actual one. |
ock.ucache.rpc.auth.domain |
EXAMPLE.COM |
Domain name specified by the KDC server. |
ock.ucache.rpc.auth.server.principle.name |
ock_server |
Principal name of the OmniShuffle server. Currently, this parameter is set to ock_server. |
ock.ucache.rpc.auth.client.principle.name |
ock_client |
Principal name of the OmniShuffle client. Currently, this parameter is set to ock_client. |
ock.ucache.rpc.auth.meta.principle.mapping |
127.0.0.1:hostname |
The value is the same as the IP address in ock.ucache.meta.node_lists. Use commas (,) to separate multiple IP addresses, for example, 127.0.0.1:hostname1,127.0.0.2:hostname2. |
ock.ucache.rpc.auth.driver.principle.mapping |
127.0.0.1:hostname |
IP address and host name of the node where the driver is located. Generally this node is the management node. |
ock.ucache.rpc.author.type |
whitelist |
The default value whitelist is used. |
ock.ucache.rpc.author.file.path |
${OCK_HOME}/security/authorization/whitelist |
Path of whitelist generated during KDC configuration. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.rpc.author.driver.file.path |
${SPARKADMIN_HOME}/security/authorization/whitelist |
Path of whitelist generated during KDC configuration. The $SPARKADMIN_HOME path indicates the path for storing the Spark task user's security files. Replace it with the actual one. |
ock.daemon.expireChecker.period |
86400 |
Security certificate check interval, in seconds. |
ock.ucache.kmc.ksf.primary.path |
${OCK_HOME}/security/pmt/master/ksfa |
Path of the kmc.primary.ks file generated by using kmc_tool (for the OCK user). Change ${OCK_HOME} to the actual OmniShuffle installation path. |
ock.ucache.kmc.ksf.standby.path |
${OCK_HOME}/security/pmt/standby/ksfb |
Path of the kmc.standby.ks file generated by using kmc_tool (for the OCK user). Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.kmc.ksf.backup.path |
${OCK_HOME}/security/pmt/kmcbakup |
Path of backups of the kmc.primary.ks and kmc.standby.ks files (for the OCK user). Change ${OCK_HOME} to the actual OmniShuffle installation path. You can back up the files to a customized path. |
ock.zookeeper.security.principle.name |
zookeeper |
Principle name of the Kerberos authentication server, indicating the first part of the principle. |
ock.zookeeper.security.principle.hostname |
server |
Principle name of the ZooKeeper server for Kerberos authentication, indicating the second part of the principle. |
ock.zookeeper.security.strategy |
GSSAPI |
Kerberos authentication mechanism supported by SASL. Retain the default value GSSAPI. |
ock.zookeeper.security.enable |
true |
Indicates whether to enable ZooKeeper encryption.
|
ock.zookeeper.security.certs |
/home/ockadmin/opt/ock/security/tls/server.crt.pem,/home/ockadmin/opt/ock/security/tls/client.crt.pem,*** |
When TLS+Kerberos is enabled, set this parameter to the certificates required by TLS (for the OCK user), including server.crt.pem, client.crt.pem, client.pem, and the PEM certificate password encrypted using KMC. When only TLS is enabled, set this parameter to false. |
ock.zookeeper.security.client.principle |
zkcli/server@EXAMPLE.COM |
Principle for Kerberos authentication on the ZooKeeper client (for the OCK user). server indicates the node host name and EXAMPLE.COM indicates the KDC domain name. |
ock.zookeeper.security.client.keytab |
${OCK_HOME}/security/kdc/krb5-server_en.keytab |
Path of the keytab file for Kerberos authentication on the ZooKeeper client (for the OCK user). Change ${OCK_HOME} to the actual OmniShuffle installation path. |
ock.ucache.broadcast.variable.create.timeout |
600000 |
Timeout duration of creating a broadcast variable, in milliseconds. The value -1 indicates that there is no timeout limit. |
ock.ucache.broadcast.variable.fetch.timeout |
600000 |
Timeout duration of fetching a broadcast variable, in milliseconds. The value -1 indicates that there is no timeout limit. |
ock.ucache.broadcast.bt.percent |
10 |
Percentage of the number of BT servers to the number of nodes in the cluster during the process of fetching broadcast variables. The value ranges from 1 to 100. |
ock.ucache.rpc.transport.ipfilter |
- |
Select a communication device name based on the network segment to which the node belongs, for example, 192.168.100.194/24<,192.168.200.194/24>. Separate multiple network segments with commas (,). You can run the ip a command to view the network segment information. It is recommended that the nodes be configured in a unified manner. |
ock.ucache.rpc.transport.devices.path |
/sys/class/infiniband/ |
Directory for storing RC NIC information. Generally, the default value is used. |
ock.ucache.rpc.openssl.path |
${OCK_HOME}/ucache/24.0.0/linux-aarch64/lib/common/openssl/libssl.so |
Path for loading the OpenSSL SO file on which OmniShuffle depends. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.rpc.crypto.path |
${OCK_HOME}/ucache/24.0.0/linux-aarch64/lib/common/openssl/libcrypto.so |
Path for loading the crypto SO file on which OmniShuffle depends. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock.ucache.rpc.tls.sdk.ca.cert.path |
/home/Sparkadmin/huawei/ock/security/tls/ca.cert.pem |
Path of the ca.cert.pem file (for the user who submits Spark tasks) that is generated on the nodes listed in agent_node_list during certificate distribution. Change /home/Sparkadmin to the actual installation path. |
ock.ucache.rpc.tls.sdk.crl.path |
- |
CRL used by the user who submits Spark tasks. If there is no user CRL path, this parameter can be left blank. |
ock.ucache.rss.bm.throttling.percent |
98 |
Upper memory usage percentage for triggering traffic limiting. |
ock.ucache.rss.queue.size |
100000 |
Length of the RSS processing queue. |
ock.ucache.rss.queue.throttling.size |
3000 |
Aggregated queues' upper limit for triggering traffic limiting. |
ock.ucache.sdk.kmc.ksf.primary.path |
/home/Sparkadmin/huawei/ock/security/pmt/master/ksfa |
Path of the kmc.primary.ks file generated by using kmc_tool (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path. |
ock.ucache.sdk.kmc.ksf.standby.path |
/home/Sparkadmin/huawei/ock/security/pmt/standby/ksfb |
Path of the kmc.standby.ks file generated by using kmc_tool (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path. |
ock.ucache.sdk.kmc.ksf.backup.path |
/home/Sparkadmin/huawei/ock/security/pmt/kmcbakup |
Path of backups of the kmc.primary.ks and kmc.standby.ks files (for the user who submits Spark tasks). Change /home/Sparkadmin to the actual installation path. You can back up the files to a customized path. |
ock.zookeeper.sdk.security.certs |
/home/Sparkadmin/huawei/ock/security/tls/server.crt.pem,/home/Sparkadmin/huawei/ock/security/tls/client.crt.pem,/home/Sparkadmin/huawei/ock/security/tls/client.pem,*** |
When TLS+Kerberos is enabled, set this parameter to the certificates required by TLS (for the user who submits Spark tasks), including server.crt.pem, client.crt.pem, client.pem, and the PEM certificate password encrypted using KMC. When only TLS is enabled, set this parameter to false. |
ock.zookeeper.sdk.security.client.principle |
zkcli/server@EXAMPLE.COM |
Principle for Kerberos authentication on the ZooKeeper client (for the user who submits Spark tasks). server indicates the node host name and EXAMPLE.COM indicates the KDC domain name. |
ock.zookeeper.sdk.security.client.keytab |
/home/Sparkadmin/huawei/ock/security/kdc/krb5-client_en.keytab |
Path of the keytab file for Kerberos authentication on the ZooKeeper client (for the user who submits Spark tasks). |
ock.daemon.expireChecker.lead |
- |
Threshold for certificate expiration notification. If this parameter is not set, the notification is triggered 7 days before the certificate expires. The value ranges from 7 to 180. |
ock.ucache.server.aggregator.core.thread.num |
4 |
Number of aggregation core threads. The value ranges from 1 to the maximum number of cores on the device. |
ock.ucache.rpc.shuffle_server.worker.thread.group |
3,1 |
You are advised to set this parameter to the number of compute nodes and the number of RSS nodes. |
ock.ucache.master.ip |
IP_ADDRESS |
IP address of the driver node. |
ock.ucache.rpc.check_task_finish.timeout |
120000 |
Timeout interval for completing an aggregation task. Default value: 120000, in milliseconds. |
ock.ucache.rpc.conn.wait.timeout |
400 |
Timeout interval for connecting to an RSS node, which defaults to 400 ms. |
ock-start-ockd-by-yarn.sh
Parameter |
Reference Value |
Description |
|---|---|---|
retry_times |
5 |
Number of times that Yarn attempts to start the OCKD process. |
interval_time |
150 |
Interval at which Yarn attempts to start the OCKD process, in seconds. |
forever_interval_time |
600 |
Interval at which Yarn attempts to start the OCKD process after retry_times start failures of the OCKD process, in seconds. |
agent_node_list
The file content format is as follows:
IP_address O&M account
If there are multiple nodes, enter one IP address and one O&M account in each line. Note that all nodes must be covered.
1.1.1.1 O&M user 1.1.1.3 O&M user 1.1.1.5 O&M user 1.1.1.7 O&M user
CA_node_list
The file content format is as follows:
IP_address O&M account
If there are multiple nodes, enter one IP address and one O&M account in each line. Only information about the management node is required.
1.1.1.9 BigDataAdmin
ock-launch-cluster.sh
Parameter |
Reference Value |
Description |
|---|---|---|
ock_vcore |
15 |
Number of CPUs occupied by OmniShuffle. |
ock_memory |
61440 |
Memory size occupied by OmniShuffle, in MB. Use the larger value between 110% of the MF memory and the sum of the MF memory and 10 GB. The value includes the memory for running OCK. The unit is MB. |
master_vcore |
5 |
Number of CPUs occupied by the launch server. |
master_memory |
10240 |
Memory size occupied by the launch server, in MB. |
queue |
- |
Yarn queue where OmniShuffle resides. |
ock_master_partition_label |
RSS |
Label of the Yarn partition where the launch server is located. |
need_kerberos |
- |
Indicates whether Kerberos authentication is required before a job is submitted. |
kerberos_conf |
- |
Path of the krb5.conf configuration file for Kerberos authentication. This parameter is valid only when need_kerberos is set to true. |
kerberos_user |
- |
User name for Kerberos authentication. This parameter is valid only when need_kerberos is set to true. |
kerberos_key_table |
- |
Path of the keytable file corresponding to the user name for Kerberos authentication. This parameter is valid only when need_kerberos is set to true. |
local_dir |
$(cd "$(dirname $0)"||exit 0; pwd) |
Current directory. |
ock_home |
$(cd "$(dirname $0)"/../../../..||exit 0; pwd) |
OmniShuffle deployment directory. |
ock_version_dir |
$(cd "$(dirname $0)"/../..||exit 0; pwd) |
OmniShuffle version directory. |
ock_version |
"${ock_version_dir##*/}" |
OmniShuffle version. |
ock_run_shell_path |
"${local_dir}/ock-start-ockd-by-yarn.sh" |
Path of the script for Yarn to start OmniShuffle. |
ock_nodes_list_path |
"${OCK_HOME}/conf/ock_node_list" |
Path of the OmniShuffle node list configuration file. |
client_jar_path |
"${OCK_HOME}/jars/ock-launch-cluster-${ock_version}.jar" |
Path of the JAR file used by Yarn to start OmniShuffle. |
log_path |
"${OCK_HOME}/logs/ock-launch-cluster.log" |
Path of the log file used by Yarn to start OmniShuffle. |
appid_path |
"${OCK_HOME}/work/yarn-appids/yarn-ock.appid" |
Path of the .appid file used by Yarn to start OmniShuffle. |
ock-stop-cluster.sh
Parameter |
Reference Value |
Description |
|---|---|---|
ock_home |
"$(cd "$(dirname $0)"/../../../..||exit ${EXT}; pwd)" |
OmniShuffle deployment directory. |
appid_path |
"${OCK_HOME}/work/yarn-appids/yarn-ock.appid" |
Path of the .appid file used by Yarn to stop OmniShuffle. |
log_path |
"${OCK_HOME}/logs/ock-stop-cluster.log" |
Path of the log file used by Yarn to stop OmniShuffle. Change $OCK_HOME to the actual OmniShuffle installation path. |
ock_id |
$(cat ${appid_path}|grep -Eo "application_[0-9]+_[0-9]+") |
Application ID of OCK in Yarn. |