Introduction to OmniScheduler
- [2025-03-30]: Released OmniScheduler 1.0.0. In a Hadoop cluster with unbalanced load between nodes, the OmniScheduler Yarn load scheduling algorithm optimizes the open source Capacity Scheduler to schedule resources based on the weight calculation and sorting results of cluster nodes' physical resources. This algorithm enables balanced resource configuration and efficient resource utilization.
Overview
The big data features of OmniRuntime are presented in the form of plugins to improve the performance of data loading, computing, and exchange from end to end.
Data volumes generated from Internet services have been growing much faster than CPUs' computing power. The open-source big data ecosystem is also developing on a fast track. However, diversified computing engines and open source components make it difficult to improve data processing performance throughout the lifecycle. Different big data engines use their own unique tuning policies and technologies to improve performance and efficiency. Some tuning items may be applied across multiple engines, which may cause resource contention and conflicts, reducing overall computing performance.
OmniRuntime consists of a series of features provided by Kunpeng BoostKit for Big Data in terms of application acceleration. It aims to improve the performance of end-to-end data loading, computing, and exchange through plugins, thereby improving the performance of big data analytics.
OmniScheduler is a subfeature of OmniRuntime. OmniScheduler enhances the capacity scheduling algorithm of Hadoop Yarn. It obtains the cluster load information and preferentially schedules low-load nodes based on the physical resource weight calculation and sorting results of node. Consequently, it improves load balancing within the cluster with balanced resource configuration and efficient resource utilization.
Compatible open source components and versions:
- Spark 3.1.1
- Spark 3.3.1
- Hive 3.1.0
- Hadoop 3.3.4
Architecture
Yet Another Resource Negotiator (Yarn) is a framework for resource management and job scheduling in a Hadoop cluster. It allocates resources and schedules jobs in a cluster so that multiple computing frameworks (such as MapReduce, Spark, and Tez) can share the same cluster resources. YARN employs the ResourceManager (RM), NodeManager (NM), and ApplicationMaster (AM) to manage resources and schedule jobs. Yarn offers multiple schedulers, including the First In First Out (FIFO) Scheduler, Capacity Scheduler, and Fair Scheduler. Choose the scheduler most appropriate to your service environment.
OmniScheduler optimizes the open source Capacity Scheduler to schedule resources based on the weight calculation and sorting results of cluster nodes' physical resources. This optimized Yarn load scheduling algorithm enables balanced resource configuration and efficient resource utilization. Figure 1 shows the overall architecture.
Figure 1 Overall architecture of OmniScheduler
It consists of five modules:
- Prometheus: This open-source event monitoring system and time series database is widely used to manage various infrastructure resources.
- Node Exporter: It is a component in the Prometheus ecosystem used to collect and expose machine-level metrics, including but not limited to CPU usage, memory usage, disk I/O, network I/O, and file system information.
- LoadsMetricApplication: This load collection and analysis tool obtains machine metric information from the Node Exporter, analyzes and processes the information, and reports the generated cluster load and balancing data to Prometheus.
- Grafana: It obtains cluster load and balancing data from Prometheus, and visualizes the data in charts and dashboards for display on the user interface.
- OmniScheduler: It obtains the node load sorting information from LoadsMetric and prioritizes job scheduling for nodes with lower loads.
Application Scenarios
Learn about the application scenarios of OmniScheduler before using the feature. OmniScheduler supports Hadoop 3.3.4.
OmniScheduler balances load between nodes of a Hadoop cluster. After a user submits a computing job (such as a Spark job) to Yarn, OmniScheduler assigns the job to less-loaded nodes based on specified parameters. This ensures balanced configuration and efficient utilization of cluster resources.
Learn about the application scenarios of OmniScheduler before using the feature. OmniScheduler supports Hadoop 3.3.4.
Related Concepts
- Prometheus: This open-source event monitoring system and time series database is widely used to manage various infrastructure resources.
- Node Exporter: It is a component in the Prometheus ecosystem used to collect and expose machine-level metrics.
- Grafana: This open-source analytics and monitoring platform is used to build and visualize time series data of various data sources. Grafana supports multiple data sources, including Prometheus, InfluxDB, Elasticsearch, and MySQL.
None.
The full project directory structure is as follows:
├── docs # Project document directory
│ └── en # English document directory
│ ├── figures # Directory of images in English documents
│ ├── release_notes.md # OmniScheduler Release Notes
│ ├── installation_guide.md # OmniScheduler Installation Guide
│ ├── user_guide.md # OmniScheduler User Guide
├── LoadsMetric # LoadsMetric service module
│ ├── LoadsMetricServer # LoadsMetric service implementation
│ ├── package # Packaging code directory
│ ├── server # Service code directory
│ ├── pom.xml # Maven project configuration file
├── yarn-schedule-load-evolution # LoadsMetric Yarn plugin module
│ ├── src # Yarn plugin code directory
│ ├── pom.xml # Maven project configuration file
├── build.sh # Compilation script
├── LoadsMetric.json # LoadsMetric configuration file
├── README_en.md # Project instroduction fileFor details about feature changes in each version, see Release Notes.
For details about the environment dependencies and installation methods of OmniScheduler, see Installation Guide.
| Name | Path | Overview |
|---|---|---|
| 1.0.0 Release Notes | Release Notes | Provides basic information and feature updates of each OmniScheduler version. |
| Installation Guide | Installation Guide | Describes how to install OmniScheduler. |
| User Guide | User Guide | Provides details about how to use OmniScheduler. |
Routine Antivirus Software Check
Periodically scan clusters and Spark components for viruses. This protects clusters from viruses, malicious code, spyware, and malicious programs, reducing risks such as system breakdown and information leakage. Mainstream antivirus software can be recommended for antivirus check.
Log Control
- Check whether the system can limit the size of a single log file.
- Check whether there is a mechanism for clearing logs when the log space is used up.
Vulnerability Fixing
To ensure the security of the production environment and reduce the risk of attacks, enable the firewall and periodically fix the following vulnerabilities:
OS vulnerabilities
JDK vulnerabilities
Hadoop and Spark vulnerabilities
ZooKeeper vulnerabilities
Kerberos vulnerabilities
OpenSSL vulnerabilities
Vulnerabilities in other components
The following uses CVE-2021-37137 as an example.
Vulnerability description:
Netty 4.1.17 has two Content-Length HTTP headers that may be confused. The vulnerability ID is CVE-2021-37137.
The system uses the hdfs-ceph (version 3.2.0) service as the storage object with decoupled storage and compute. This service depends on aws-java-sdk-bundle-1.11.375.jar and involves this vulnerability. You are advised to update the vulnerability patch in a timely manner to prevent hacker attacks.
Impact:
Netty 4.1.68 and earlier versions
Handling suggestion:
Currently, the vendor has released an upgrade patch to fix the vulnerability. For details, visit GitHub.
SSH Hardening
During the installation and deployment, you need to connect to the server through SSH. The root user has all the operation permissions. Logging in to the server as the root user may pose security risks. You are advised to log in to the server as a common user for installation and deployment and disable root user login using SSH to improve system security.
Check the PermitRootLogin configuration item in /etc/ssh/sshd_config.
- If the value is no, root user login using SSH is disabled.
- If the value is yes, change it to no.
Communication Matrix
To OmniScheduler users
This tool is intended solely for debugging and development. You are responsible for any risks and should carefully review the following information:
- Data processing and deletion: Users are responsible for managing and deleting any data generated while using this tool. You are advised to promptly delete any related data after use to prevent information leaks.
- Data confidentiality and transmission: Users understand and agree not to share or transmit any data generated by this tool. Neither the tool nor its developers are responsible for any information leaks, data breaches, or other negative consequences.
- User input security: Users are responsible for the security of any commands they enter and for any risks or losses resulting from improper input. The tool and its developers are not liable for issues caused by incorrect command usage.
Disclaimer scope: This disclaimer applies to all individuals and entities using this tool. By using the tool, you acknowledge and accept this statement and assume all risks and responsibilities arising from its use. If you do not agree, please stop using the tool immediately.
Before using this tool, please read and understand the preceding disclaimer. If you have any questions, contact the developer.
To data owners
If you do not want your model or dataset to be mentioned in OmniScheduler, or if you wish to update its description, please submit an issue on GitCode. We will delete or update your description according to your request. Thank you for your understanding and contribution to OmniScheduler.
For the OmniScheduler product license, see the LICENSE file for details.
The documents in the docs directory of OmniScheduler are licensed under CC-BY 4.0. See the LICENSE file for details.
- Submit an error report: If you discover a vulnerability in OmniScheduler that is not a security issue, first search the Issues in the OmniScheduler repository to avoid submitting duplicates. If the vulnerability is not listed, create a new issue. If you discover a security-related problem, do not disclose it publicly. Please refer to the security handling guidelines for details. All error reports must include complete information about the issue.
- Security issue handling: For guidance on handling security issues in this project, please contact the core team via email for instructions.
- Resolving existing issues: Review the repository's issue list to identify issues that need attention, and attempt to resolve them.
- How to propose new functions: Use the Feature tag when creating an issue for a new function. We will review and confirm proposals periodically.
- How to contribute:
- Fork the repository of the project.
- Clone it to your local machine.
- Create a development branch.
- Local testing: All unit tests, including any new test cases, must pass before submission.
- Commit your code.
- Create a pull request (PR).
- Code review: Modify the code according to review comments and resubmit your changes. This process may involve multiple rounds of iterations.
- After your PR is approved by the required number of reviewers, the committer will conduct the final review.
- After your PR is approved and all tests pass, the CI system will merge it into the project's main branch.
You are welcome to contribute to the community. If you have any questions or suggestions, please submit an issue. We will respond as soon as possible. Thank you for your support.
OmniScheduler is jointly developed by the following Huawei departments:
Kunpeng Computing BoostKit Development Dept
Thank you to everyone in the community for your PRs. We warmly welcome contributions to OmniScheduler!