Problem Analysis
- This section uses an Arm-based Python binary within an independent Python package submitted by a PySpark task as an example. Since the cluster is a mixed x86 and Arm deployment, running the Arm-based Python binary on an x86 node will result in failure.
- The task script /opt/test_spark.py is an example and can be replaced. The content of the test_spark.py script is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# test_spark.py import os import sys from pyspark import SparkContext from pyspark import SparkConf conf = SparkConf() conf.setAppName("get-hosts") sc = SparkContext(conf=conf) def noop(x): import socket import sys return socket.gethostname() + ' '.join(sys.path) + ' '.join(os.environ) rdd = sc.parallelize(range(1000), 100) hosts = rdd.map(noop).distinct().collect() print(hosts)
- Submit a PySpark task.
1PYSPARK_PYTHON=./ANACONDA/mlpy_env/bin/python spark-submit --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./ANACONDA/mlpy_env/bin/python --conf spark.executorEnv.PYSPARK_PYTHON=./ANACONDA/mlpy_env/bin/python --master yarn-cluster --archives /opt/mlpy_env.zip#ANACONDA /opt/test_spark.py
- Analyze the task running information in the Spark2 History service.
The following figure shows the running information of executors.

In the preceding figure, agent1 is an x86 server, and agent2 is an Arm server. The independent Python package fails to run on agent1 because it is built for the Arm architecture.
The error log for the executor failure is as follows:

- Analyze the transfer process of the /opt/mlpy_env.zip file.
- After the task is submitted, the independent Python package is uploaded to the HDFS.
120/07/22 19:00:51 INFO Client: Uploading resource file:/home/mlpy_env.zip#ANACONDA -> hdfs://server1:8020/user/hdfs/.sparkStaging/application_1595415474950_0002/mlpy_env.zip
- When a container is loaded, the independent Python package is downloaded and decompressed.
The download path of the independent Python package is in the directory specified by yarn.nodemanager.local-dirs.
- After the task is submitted, the independent Python package is uploaded to the HDFS.
Parent topic: Implementation Principles
