Rate This Document
Findability
Accuracy
Completeness
Readability

Setting Up the Development Environment

Client Environment

Table 1 describes the client environment requirements.

Table 1 Client environment requirements

Item

Version

Remarks

OS

Windows 7 or later

Prepare it in advance.

Installing JDK

OpenJDK 1.8

See Creating a Project.

Installing and configuring the development tool

Eclipse or IntelliJ IDEA is recommended. This document uses IntelliJ IDEA (2018.2) as an example.

Prepare it in advance.

Installing Scala

Complete the basic configuration for the Scala environment. For Spark 2.3.2 and Spark 2.4.6, the recommended Scala version is 2.11.8.

See Creating a Project.

Installing Maven

Compile the project package. Recommended version: 3.6.3.

See Creating a Project.

Obtaining the Software

Table 2 describes how to obtain the library package of a machine learning algorithm.

Table 2 How to obtain the library packages

Applicable Spark Version

Software Package and URL

Remarks

Spark 2.3.2/2.4.5/2.4.6/3.1.1

Huawei technical support websites

NA

Spark 2.3.2

boostkit-ml-acc_2.11-2.2.0-spark2.3.2.jar

boostkit-ml-core_2.11-2.2.0-spark2.3.2.jar

boostkit-ml-kernel-client_2.11-2.2.0-spark2.3.2.jar

  • For details about how to compile the packages, see Compiling the Code in the Big Data Machine Learning Algorithm Library Feature Guide.
  • boostkit-ml-acc_2.XX-XXX-sparkXX.jar

    It is required for software running and must be deployed.

  • boostkit-ml-core_2.XX-XXX-sparkXX.jar

    It is required for software running and must be deployed.

  • boostkit-ml-kernel-client_2.XX-XXX-sparkXX.jar

    It is required for software compilation and does not need to be deployed.

  • boostkit-xgboost4j_XXX.jar

    Adaptation package required by the XGBoost algorithm, which can be compiled from the open source adaptation code. It is required for software running and must be deployed.

boostkit-xgboost4j_2.11-2.2.0.jar

boostkit-xgboost4j-spark2.3.2_2.11-2.2.0.jar

Spark 2.4.5/2.4.6

boostkit-ml-acc_2.11-2.2.0-spark2.4.6.jar

boostkit-ml-core_2.11-2.2.0-spark2.4.6.jar

boostkit-ml-kernel-client_2.11-2.2.0-spark2.4.6.jar

boostkit-xgboost4j_2.11-2.2.0.jar

boostkit-xgboost4j-spark2.4.6_2.11-2.2.0.jar

Spark 3.1.1

boostkit-ml-acc_2.12-2.2.0-spark3.1.1.jar

boostkit-ml-core_2.12-2.2.0-spark3.1.1.jar

boostkit-ml-kernel-client_2.12-2.2.0-spark3.1.1.jar

After obtaining the BoostKit-ml_2.2.0.zip software package, verify that it is consistent with that provided on the website.

Verify the software package as follows:
  1. Obtain the digital certificate and software.
  2. Obtain the verification tool and method from the following link:

    https://support.huawei.com/enterprise/en/tool/pgp-verify-TL1000000054

  3. Verify the software package integrity by following the procedure described in the OpenPGP Signature Verification Guide obtained from the URL.

Cluster Environment

Prepare the required cluster environment before algorithm development. Table 3 lists the required software versions.
Table 3 Cluster environment requirements

Item

Requirement

OS

openEuler-20.03-LTS-SP1

JDK

BiSheng JDK 1.8.0_262

ZooKeeper

3.4.9

Hadoop

3.1.1

Spark

Apache Spark 2.3.2, 2.4.5, 2.4.6, or 3.1.1

The Kunpeng algorithm library is compatible with Apache Spark 2.3.2, 2.4.5, 2.4.6, and 3.1.1. Other platforms are not verified. For security purposes, you are advised to use a later version.