Introduction to OmniScheduler

What's New

[2025-03-30]: Released OmniScheduler 1.0.0. In a Hadoop cluster with unbalanced load between nodes, the OmniScheduler Yarn load scheduling algorithm optimizes the open source Capacity Scheduler to schedule resources based on the weight calculation and sorting results of cluster nodes' physical resources. This algorithm enables balanced resource configuration and efficient resource utilization.

Introduction to the Project

Overview

The big data features of OmniRuntime are presented in the form of plugins to improve the performance of data loading, computing, and exchange from end to end.

Data volumes generated from Internet services have been growing much faster than CPUs' computing power. The open-source big data ecosystem is also developing on a fast track. However, diversified computing engines and open source components make it difficult to improve data processing performance throughout the lifecycle. Different big data engines use their own unique tuning policies and technologies to improve performance and efficiency. Some tuning items may be applied across multiple engines, which may cause resource contention and conflicts, reducing overall computing performance.

OmniRuntime consists of a series of features provided by Kunpeng BoostKit for Big Data in terms of application acceleration. It aims to improve the performance of end-to-end data loading, computing, and exchange through plugins, thereby improving the performance of big data analytics.

OmniScheduler is a subfeature of OmniRuntime. OmniScheduler enhances the capacity scheduling algorithm of Hadoop Yarn. It obtains the cluster load information and preferentially schedules low-load nodes based on the physical resource weight calculation and sorting results of node. Consequently, it improves load balancing within the cluster with balanced resource configuration and efficient resource utilization.

Compatible open source components and versions:

Spark 3.1.1
Spark 3.3.1
Hive 3.1.0
Hadoop 3.3.4

Architecture

Yet Another Resource Negotiator (Yarn) is a framework for resource management and job scheduling in a Hadoop cluster. It allocates resources and schedules jobs in a cluster so that multiple computing frameworks (such as MapReduce, Spark, and Tez) can share the same cluster resources. YARN employs the ResourceManager (RM), NodeManager (NM), and ApplicationMaster (AM) to manage resources and schedule jobs. Yarn offers multiple schedulers, including the First In First Out (FIFO) Scheduler, Capacity Scheduler, and Fair Scheduler. Choose the scheduler most appropriate to your service environment.

OmniScheduler optimizes the open source Capacity Scheduler to schedule resources based on the weight calculation and sorting results of cluster nodes' physical resources. This optimized Yarn load scheduling algorithm enables balanced resource configuration and efficient resource utilization. Figure 1 shows the overall architecture.

Figure 1 Overall architecture of OmniScheduler

It consists of five modules:

Prometheus: This open-source event monitoring system and time series database is widely used to manage various infrastructure resources.
Node Exporter: It is a component in the Prometheus ecosystem used to collect and expose machine-level metrics, including but not limited to CPU usage, memory usage, disk I/O, network I/O, and file system information.
LoadsMetricApplication: This load collection and analysis tool obtains machine metric information from the Node Exporter, analyzes and processes the information, and reports the generated cluster load and balancing data to Prometheus.
Grafana: It obtains cluster load and balancing data from Prometheus, and visualizes the data in charts and dashboards for display on the user interface.
OmniScheduler: It obtains the node load sorting information from LoadsMetric and prioritizes job scheduling for nodes with lower loads.

Application Scenarios

Learn about the application scenarios of OmniScheduler before using the feature. OmniScheduler supports Hadoop 3.3.4.

OmniScheduler balances load between nodes of a Hadoop cluster. After a user submits a computing job (such as a Spark job) to Yarn, OmniScheduler assigns the job to less-loaded nodes based on specified parameters. This ensures balanced configuration and efficient utilization of cluster resources.

Learn about the application scenarios of OmniScheduler before using the feature. OmniScheduler supports Hadoop 3.3.4.

Prometheus: This open-source event monitoring system and time series database is widely used to manage various infrastructure resources.
Node Exporter: It is a component in the Prometheus ecosystem used to collect and expose machine-level metrics.
Grafana: This open-source analytics and monitoring platform is used to build and visualize time series data of various data sources. Grafana supports multiple data sources, including Prometheus, InfluxDB, Elasticsearch, and MySQL.

Constraints

None.

Directory Structure

The full project directory structure is as follows:

├── docs                                                      # Project document directory
│   └── en                                                   # English document directory
│       ├── figures                                          # Directory of images in English documents
│       ├── release_notes.md                                 # OmniScheduler Release Notes
│       ├── installation_guide.md                            # OmniScheduler Installation Guide
│       ├── user_guide.md                                    # OmniScheduler User Guide
├── LoadsMetric                                              # LoadsMetric service module
│   ├── LoadsMetricServer                                    # LoadsMetric service implementation
│       ├── package                                          # Packaging code directory
│       ├── server                                           # Service code directory
│       ├── pom.xml                                          # Maven project configuration file
├── yarn-schedule-load-evolution                             # LoadsMetric Yarn plugin module
│   ├── src                                                  # Yarn plugin code directory
│   ├── pom.xml                                              # Maven project configuration file
├── build.sh                                                 # Compilation script
├── LoadsMetric.json                                         # LoadsMetric configuration file
├── README_en.md                                             # Project instroduction file

Release Notes

For details about feature changes in each version, see Release Notes.

Environment Deployment

For details about the environment dependencies and installation methods of OmniScheduler, see Installation Guide.

Helpful Links

Name	Path	Overview
1.0.0 Release Notes	Release Notes	Provides basic information and feature updates of each OmniScheduler version.
Installation Guide	Installation Guide	Describes how to install OmniScheduler.
User Guide	User Guide	Provides details about how to use OmniScheduler.

Security Statement

Routine Antivirus Software Check

Periodically scan clusters and Spark components for viruses. This protects clusters from viruses, malicious code, spyware, and malicious programs, reducing risks such as system breakdown and information leakage. Mainstream antivirus software can be recommended for antivirus check.

Log Control

Check whether the system can limit the size of a single log file.
Check whether there is a mechanism for clearing logs when the log space is used up.

Vulnerability Fixing

To ensure the security of the production environment and reduce the risk of attacks, enable the firewall and periodically fix the following vulnerabilities:

OS vulnerabilities
JDK vulnerabilities
Hadoop and Spark vulnerabilities
ZooKeeper vulnerabilities
Kerberos vulnerabilities
OpenSSL vulnerabilities
Vulnerabilities in other components

The following uses CVE-2021-37137 as an example.

Vulnerability description:

Netty 4.1.17 has two Content-Length HTTP headers that may be confused. The vulnerability ID is CVE-2021-37137.

The system uses the hdfs-ceph (version 3.2.0) service as the storage object with decoupled storage and compute. This service depends on aws-java-sdk-bundle-1.11.375.jar and involves this vulnerability. You are advised to update the vulnerability patch in a timely manner to prevent hacker attacks.

Impact:

Netty 4.1.68 and earlier versions

Handling suggestion:

Currently, the vendor has released an upgrade patch to fix the vulnerability. For details, visit GitHub.

SSH Hardening

During the installation and deployment, you need to connect to the server through SSH. The root user has all the operation permissions. Logging in to the server as the root user may pose security risks. You are advised to log in to the server as a common user for installation and deployment and disable root user login using SSH to improve system security.

Check the PermitRootLogin configuration item in /etc/ssh/sshd_config.

If the value is no, root user login using SSH is disabled.
If the value is yes, change it to no.

Communication Matrix

Source Device	Server running the Resource Manager	Server running the CLI query client process	Server running the Resource Manager	Server running the Resource Manager
Source IP Address	IP address of the server running the Resource Manager	No default value (IP address of the CLI query client)	IP address of the server running the Resource Manager	IP address of the server running the Resource Manager
Source Port	1024 to 65535	1024 to 65535	1024 to 65535	1024 to 65535
Destination Device	Server running the Resource Manager	Server running the Resource Manager	Server running the Resource Manager	Server running the Node Manager
Destination IP Address	IP address of the server running the Resource Manager	IP address of the server running the Resource Manager	IP address of the server running the Resource Manager	IP address of the server running the Node Manager
Destination Port (Listening)	9090	3000	9060	9100
Protocol	TCP	TCP	TCP	TCP
Port Description	Spring Boot listening port, which is used for communication between the Resource Manager and Spring Boot	Grafana listening port, which is used for communication between the client and Grafana	Prometheus listening port, which is used for communication between Grafana and Prometheus	Node Exporter listening port, which is used for communication between Spring Boot and Node Exporter
Listening Port Configurable	Yes	Yes	Yes	Yes
Authentication Mode	N/A	N/A	N/A	N/A
Encryption Mode	N/A	N/A	N/A	N/A
Plane	Service plane	Service plane	Service plane	Service plane
Version	All	All	All	All
Special Scenario	None	None	None	None

Disclaimer

To OmniScheduler users

This tool is intended solely for debugging and development. You are responsible for any risks and should carefully review the following information:
- Data processing and deletion: Users are responsible for managing and deleting any data generated while using this tool. You are advised to promptly delete any related data after use to prevent information leaks.
- Data confidentiality and transmission: Users understand and agree not to share or transmit any data generated by this tool. Neither the tool nor its developers are responsible for any information leaks, data breaches, or other negative consequences.
- User input security: Users are responsible for the security of any commands they enter and for any risks or losses resulting from improper input. The tool and its developers are not liable for issues caused by incorrect command usage.
Disclaimer scope: This disclaimer applies to all individuals and entities using this tool. By using the tool, you acknowledge and accept this statement and assume all risks and responsibilities arising from its use. If you do not agree, please stop using the tool immediately.
Before using this tool, please read and understand the preceding disclaimer. If you have any questions, contact the developer.

To data owners

If you do not want your model or dataset to be mentioned in OmniScheduler, or if you wish to update its description, please submit an issue on GitCode. We will delete or update your description according to your request. Thank you for your understanding and contribution to OmniScheduler.

License

For the OmniScheduler product license, see the LICENSE file for details.

The documents in the docs directory of OmniScheduler are licensed under CC-BY 4.0. See the LICENSE file for details.

Contribution Statement

Submit an error report: If you discover a vulnerability in OmniScheduler that is not a security issue, first search the Issues in the OmniScheduler repository to avoid submitting duplicates. If the vulnerability is not listed, create a new issue. If you discover a security-related problem, do not disclose it publicly. Please refer to the security handling guidelines for details. All error reports must include complete information about the issue.
Security issue handling: For guidance on handling security issues in this project, please contact the core team via email for instructions.
Resolving existing issues: Review the repository's issue list to identify issues that need attention, and attempt to resolve them.
How to propose new functions: Use the Feature tag when creating an issue for a new function. We will review and confirm proposals periodically.
How to contribute:
1. Fork the repository of the project.
2. Clone it to your local machine.
3. Create a development branch.
4. Local testing: All unit tests, including any new test cases, must pass before submission.
5. Commit your code.
6. Create a pull request (PR).
7. Code review: Modify the code according to review comments and resubmit your changes. This process may involve multiple rounds of iterations.
8. After your PR is approved by the required number of reviewers, the committer will conduct the final review.
9. After your PR is approved and all tests pass, the CI system will merge it into the project's main branch.

Suggestions and Feedback

You are welcome to contribute to the community. If you have any questions or suggestions, please submit an issue. We will respond as soon as possible. Thank you for your support.

Acknowledgments

OmniScheduler is jointly developed by the following Huawei departments:

Kunpeng Computing BoostKit Development Dept

Thank you to everyone in the community for your PRs. We warmly welcome contributions to OmniScheduler!