Troubleshooting
TPC-H SQL6 Error
Symptom:
The TPC-H SQL6 reports an error.

Solution:
This error occurs because the default value of the max.task.queue.size configuration item of OmniData is four times the number of CPUs on the storage node. You can change the value to increase the queue depth.
TPC-H SQL17 Error
Symptom:
The TPC-H SQL17 reports an error.

Solution:
This error occurs because the default value of the task.timeout.period configuration item of OmniData is 120,000 ms (2 minutes). You can change the value to increase the task processing timeout duration.
TPC-DS SQL3 Error
Symptom:
The execution of TPC-DS SQL3 causes the core dump of the engine.

Solution:
This exception is an inherent problem of the Ceph community and needs to be resolved by the community.
Add the rgw_nfs_lru_lane_hiwat configuration item to the [global] section in the Ceph configuration file /etc/ceph/ceph.conf and set the value to 65535.
OmniData Limitation
- Operators in transaction list tables cannot be pushed down.
- Operators in bucket tables cannot be pushed down.
An Error Is Reported in the haf daemon Log
Symptom:
The following error information is recorded in the haf daemon log:
[ERROR] [ProcessID:1512097] [daemon_recv] TlsAccept:428] [LINK]target TlsAcceptDeal failed. channelID=0

Solution:
Perform the certificate generation operations on both offload nodes and host nodes.
An Error Is Reported During Spark Execution
Symptom:
An error is reported during Spark execution and no error is recorded in the haf daemon log. The command output is as follows:
Failed to create task.

Solution:
Add spark.executorEnv.HAF_CONFIG_PATH path to the Spark configuration file, where path indicates the installation path of HAF on the host node.

When haf-tool Is Used to Generate Certificates, the Generation Fails Occasionally
Symptom:
When haf-tool is used to generate a certificate, an error occurs occasionally. The error information is as follows:
/home/omm/haf-install/haf-host/tools/scripts/csr_gen_host.sh generate host csr failed

Solution:
Generally, this error occurs because the following host scripts have been sourced when OmniData is started. As a result, the OpenSSL command version conflicts with the local OpenSSL command version. When running the preceding certificate generation command, restart a shell.

Could Not Find the core File During the Running of HAF
Symptom:
When HAF is running, an exception occurs. As a result, the process exits and the coredump file is not found.
Solution:
The core file is generated by the system. HAF does not have the permission to modify the core file. You are advised to check the system configuration file to determine the location where the coredump file is generated.
cat /proc/sys/kernel/core_pattern

Failed to Install HAF in the x86 Environment
Symptom:
The error message "cannot execute binary file: Exec format error" is displayed when HAF is installed in the x86 environment.

Solution:
HAF cannot be installed in the x86 environment. Replace it with the Kunpeng environment.
Insufficient crontab Permission During HAF Installation
Symptom:
An error message "You (haf) are not allowed to access to (crontab) because of pam configuration" is displayed during HAF installation on the offload node.

Solution:
- Locate the crontab file.
which crontab

- Add the s bit to crontab and change the owner of crontab to root.
chown root:root /usr/bin/crontab chmod u+s /usr/bin/crontab
System Variable Problem During HAF Installation
Symptom:
An error message "failed to generate csr" is displayed during HAF installation on offload nodes.

Possible Cause:
The built-in OpenSSL library is replaced by the libssl.so.1.1 library of HAF. As a result, an error occurs when OpenSSL is running.
Solution:
Check whether the environment variables in /etc/profile import the .so file in the haf directory to the system environment variables. If yes, delete the .so file.
An Error Is Reported When the Divisor in SQL Statements Is 0
Symptom:
The SQL statement select max(o_bigint %o_int) from tpch_flat_orc_date_100.data_type_test_orc returns NULL in Hive that is pushed down. Error message: io.prestosql.spi.PrestoException: Division by zero

Solution:
The two engines use different computing mechanisms. Although the native Hive returns a result, the returned result is incorrect. This error does not affect the running of Hive and can be ignored.
Errors Are Reported in TPC-DS Partition Tables
Symptom:
TPC-DS partition tables SQL5, 13, 25, 62, 88, and 99 report errors. Error message: "DeserializeRead detail: Reading byte[] of length 4096 at start offset 532 for length 2 to read 1 fields with types [int]". Read field #1 at field start position 0 current read offset 534

Solution:
- Add the following parameters to the hive-site.xml configuration file of Hive:
<property> <name>hive.mapjoin.hybridgrace.hashtable</name> <value>false</value> </property> <property> <name>hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled</name> <value>true</value> </property> - Run the set command:
set hive.mapjoin.hybridgrace.hashtable=false; set hive.vectorized.execution.mapjoin.native.fast.hashtable.enabled=true;