Algorithm Library Deployment in a Hybrid Cluster

Overview

The Kunpeng BoostKit for Big Data machine learning algorithm library supports only Kunpeng servers. If you use the machine learning algorithm library in a big data cluster that has both Kunpeng and x86 servers, make sure that the resources allocated using Yarn are all from Kunpeng servers to ensure that the tasks that use the algorithm library can run properly.

Related Concepts

Yarn: Apache Hadoop Yet Another Resource Negotiator (Yarn) is a new Hadoop resource manager. As a universal resource management system, it provides unified resource management and scheduling for upper-layer applications, remarkably improving cluster resource utilization, unified resource management, and data sharing.
Queue: Yarn resources are managed in queues. Different queues are created to control the resources available to each user.
Label: Yarn node labels are used to group nodes with similar features so that applications can run on the same group of nodes.
Node manager: It is the agent on each node. It manages each compute node in a Hadoop cluster, establishes communication with the resource manager, manages the lifecycle of containers, monitors the usage of resources such as memory and CPUs on each container, monitors the node health status, and manages logs and those auxiliary services used by different applications.
Resource manager: A master node that arbitrates available resources of the entire cluster and helps Yarn manage distributed applications on it.

Deployment Solution

Kunpeng and x86 server nodes are divided into different queues. Algorithm tasks can be submitted only to Kunpeng queues, and other tasks can be submitted to Kunpeng or x86 queues.

Figure 1 shows the hybrid cluster deployment architecture.

Figure 1 Hybrid cluster deployment architecture

The deployment solution has the following features and constraints:
- Spark tasks are submitted to Yarn queues. Yarn queues can be associated with multiple labels using the Node Labels feature. Only one label can be added to a node. You can specify a queue and a label to enable tasks to run on the nodes with a specified label.
- Add labels to all Kunpeng servers and assign the labels to the Kunpeng boostkit queue. Algorithm tasks must be submitted to Kunpeng queues, and non-algorithm tasks can be submitted to Kunpeng or x86 queues. A Kunpeng queue and an x86 queue are mutually exclusive. A task cannot run in a Kunpeng queue and an x86 queue at the same time.
- Enable node labeling for big data platforms (such as FusionInsight) and configure the same label for each Kunpeng server. In addition, create Yarn queues for algorithm tasks, and then authorize the labels of Kunpeng server nodes to algorithm task queues.
- The algorithm application ISV needs to configure the queue in the algorithm task submission command as the Kunpeng boostkit queue to ensure that the algorithm task runs on Kunpeng server nodes.
Deploy the algorithm package as follows:
- The algorithm library needs to be deployed only on the client nodes where the task is submitted. It does not need to be deployed on the other compute nodes.
- After the task is submitted, the open source component Yarn distributes the algorithm package to the Kunpeng server nodes that run the task. You do not need to develop or deploy other tools to perform this operation.

Yarn Node Labels

Yarn node labels are used to group nodes with similar features so that applications can run on the same group of nodes.

Yarn node labels (partitions) have the following features and constraints:

A node can have only one node label. Therefore, a cluster is divided into several non-intersecting subclusters by node labels. By default, the nodes belong to the default partition (partition="").
For each node label (partition), you need to configure the number of resources that can be used by different queues.
There are two types of node labels (partitions):
- Exclusive: A container is allocated to a node that fully matches the node label (partition). For example, the request partition= " x " will be allocated to the node with the partition= " x " label, and the request DEFAULT partition will be allocated to the node with DEFAULT partition.
- Non-exclusive: When a partition is non-exclusive, it shares its idle resources with the containers that request the DEFAULT partition.

You can specify the set of node labels that can be accessed by each queue. An application can use only the subset of node labels that can be accessed by the queue containing the application.

Parent topic: Configuring Spark in a Hybrid Cluster