Feature List

Donau Portal Feature List

Feature	Description
System management	Allows you to configure user synchronization based on user authentication to synchronize local users, NIS users, LDAP users, or AD users to the system, synchronize LDAP user groups, create and delete user groups, and assign permissions based on users and user groups. NOTE: User synchronization requires manual operations. Automatic synchronization is not supported. A user group does not support nested creation or secondary permission granting to users in user groups.
Job management	Allows you to submit common background jobs and VNC jobs, and suspend, resume, suspend, and restart offline jobs. Allows you to submit jobs to third-party schedulers. Allows you to view the historical job list, filter jobs by keyword, and refresh the job list. Allows you to view the job task list, filter tasks by keyword, and refresh the task list. Allows you to view basic and advanced information about jobs and tasks. Allows you to view the basic and advanced information about jobs, CPU and memory usage of jobs, and switch to the job directory.
Data management	Allows you to upload one or more data files; and supports resumable upload of large files (≤ 100 GB). Allows you to download a single file; displays files and folders in the way the File Explorer does; and implements permission isolation between users. Allows you to add, delete, modify, query, copy, paste, decompress, and compress files and folders; and supports online check of TXT files. Allows you to associate data files with simulation applications or job templates, and open data files with double-click. Supports multi-cluster data transmission.
Application repository management	Integrates simulation applications into the system based on the execution scripts and displays the applications as forms externally; and supports intelligent template integration. Integrates remote Linux applications into the system based on the execution scripts; and associates data files with application templates. Integrates remote Windows applications into the system based on the execution scripts. Allows you to test integrated simulation applications to ensure that the applications run properly based on the returned results. Allows you to publish, cancel publishing, save as, and delete integrated and remote applications.
3D remote application	Allows you to use the WebUI to start applications deployed on the remote Linux/Windows server. Displays all active sessions in joined tables; allows you to manually disconnect created remote sessions; and allows idle sessions to be automatically released. Limits the number of sessions that can be used by a user. Allows you to select the visualization node using a random algorithm.
Notification management	Sends notifications when job statuses change. Displays job details based on the job ID in the pushed notifications. Receives job notifications from multiple heterogeneous clusters. Displays the data upload and download progresses.
Cluster monitoring	Allows you to create monitoring items and dashboards that contain multiple monitoring items by editing configuration files. Monitors the usage of hardware resources (CPU, GPU, memory, temporary partitions, and swap partitions) and node resources. Monitors the number of packets sent and received by the CNP, PFC, and NIC, and the number of retransmitted packets due to packet loss. Monitors the numbers of running and submitted jobs in a cluster. Monitors the number of completed jobs in a cluster by status.
Cluster analysis	Allows you to create multi-dimensional reports and dashboards that contain multiple charts by editing configuration files. Allows you to export reports and charts as data (CSV files) or snapshots (PNG files). Analyzes the usage of hardware resources (CPU, GPU, memory, temporary partitions, and swap partitions) and node resources. Analyzes the number of active jobs, memory, temporary partition, swap partition, running duration, CPU usage duration, CPU usage duration of the system, and CPU usage duration of user code from the dimensions of time, job status, cluster, user group, user, queue, and application. Analyzes the number of completed jobs, memory, running duration, CPU usage duration, and waiting duration from the dimensions of time, job status, cluster, user group, user, queue, and application.
Resource pricing	Allows you to view the pricing details of a tenant or account based on the user permissions, and export the details as data (CSV files) or snapshots (PNG files). Displays the pricing trend of the whole year, and allows you to create, edit, and delete pricing plans.
Data collection	Collects the completed job data, cluster server data, cluster core data, and script data on Donau Scheduler.

Donau Scheduler Feature List

Feature	Description
System management	Allows you to configure user synchronization based on user authentication to synchronize local users, NIS users, LDAP users, or AD users to the system, and assign permissions based on users and user groups. NOTE: Users in a user group cannot be assigned permissions again. Allows you to import the license and check the validity of the license. Allows you to manually activate or revoke a license, and obtain the revocation code for applying for a new license. Allows you to view the license usage information; displays a warning message before the license expires; and periodically displays a message when the license expires or is invalid. Allows you to configure the expiration time of the access token and refresh token, and check whether the refresh token is enabled.
Job management	Job submission: Allows you to submit common HPC job parameters, including the job name, user group, job execution duration, queue to which a user belongs, job execution command, job description, job scheduling time, number of job copies, environment variables, input and output redirection, account, log path redirection, and task execution path. Allows you to set inter-job dependencies, job priorities, runtime resource restrictions, prehook/posthook, retry times upon job/task failures, and job/task timeout interval. Allows you to specify compute node labels, resource requirements, and CPU and memory binding requirements when submitting jobs. Supports common serial jobs, MPI jobs (including openmpi, hmpi, mpich, intelmp, and cosched jobs), array jobs, blocking jobs, and interactive jobs. Allows you to submit a single job or jobs in batches, or submit jobs using scripts.
Job control: Supports job or task recovery/batch recovery, suspension/batch suspension, termination/batch termination, restart/batch restart, and re-submission/batch re-submission. Allows you to submit remarks such as reasons for control operations. This function can be used together with any control command.
Job query: Allows you to query the brief information, detailed information, specified fields, and custom fields of a job or task, and query the detailed reason why a task is pending. Allows you to filter jobs by job ID, job name, user name, user group, queue, account, time, and status. Allows you to filters tasks by index, execution node, and status. Allows you to query the brief and detailed information, specified fields, and custom fields of jobs and tasks at the same time. Allows you to filter jobs and tasks by job ID, user name, user group, queue, account, time, execution node, and status. Allows you to query a large number of jobs or tasks by page or query information about jobs or tasks with the specified page or quantity, as well as query help information. The output can be in long/wide format or JSON format.
Job scheduling: Supports scheduling policies such as FIFO, Gang Scheduler, Fairshare, and preemption; and CPU-memory affinity scheduling. Supports multi-dimensional resource scheduling, such as Kunpeng, Ascend, and x86 servers, CPU, memory, GPU, Ascend acceleration card, and custom resources. Supports multiple resource pools, dynamically allocates resources such as CPUs, memory, GPUs, or nodes based on loads, and borrows resources across resource pools. Selects candidate nodes randomly or based on the registration sequence, maximum available resources, or specified label. Allows you to set the scheduling limit based on the maximum number of jobs that can be run in a cluster, the maximum number of scheduling times of a queue, account, or user, and the maximum number of pending jobs of a user. Allows you to configure the brief and detailed pending reason for scheduling. Allows you to enable or disable a scheduling phase, and configure the scheduling phases for the purpose of achieving high scheduling efficiency, ensuring fair resource allocation, or delivering the maximum number of jobs.
Job execution: Supports startup of serial, parallel, MPI, array, blocking, and interactive jobs. Allows you to specify the report period, collect task resource consumption information based on the cgroup, collect information such as the job running duration, CPU usage duration, average memory, and peak memory, and run daemon processes for jobs. Allows you to set the memory limit based on the cgroup and supports jobs to inherit the system limit. Allows you to configure the prehook/posthook that take effect globally, the number of retry times, execution timeout interval, and the blocklist of nodes where prehook failures occur. Supports the prehook/posthook at the job level in MPI jobs and the prehook/posthook at the task level in non-MPI jobs. Outputs basic job information and running data to the specified output file or the prehook phase, and reports the job result with standard error information.
Cluster management	Cluster resource information management: Allows you to query the brief information and details of a queue, and customize the queue. Allows you to query the brief information and details of a node, including: Name, total CPUs, available CPUs, CPU topology, total memory, available memory, swap, tmp, average CPU load per minute, GPUs, node storage information, number of jobs on the node, labels, SDRs Allows you to enable or disable a node, set the reason for enabling or disabling a node, define the memory usage threshold of a node, customize resources based on node configurations, add, delete, modify, and query node labels, remove nodes, and manually or automatically suspend or resume idle nodes. Allows you to query job statistics, customize user configurations, make user synchronization configurations take effect in real time, dynamically assign job, node, and queue management permissions to users, view multi-dimensional statistics of the entire cluster, view brief and detailed account information, and customize account configurations. Supports online configuration of account, resource pool, and resource allocation policies of resource pools.
Job data lifecycle management: Allows you to configure the maximum storage duration of real-time data, historical data, and archived data. Allows you to define storage locations and data types of data files.

Feature

Description

System management

Allows you to configure user synchronization based on user authentication to synchronize local users, NIS users, LDAP users, or AD users to the system, and assign permissions based on users and user groups.
NOTE:
Users in a user group cannot be assigned permissions again.
Allows you to import the license and check the validity of the license.
Allows you to manually activate or revoke a license, and obtain the revocation code for applying for a new license.
Allows you to view the license usage information; displays a warning message before the license expires; and periodically displays a message when the license expires or is invalid.
Allows you to configure the expiration time of the access token and refresh token, and check whether the refresh token is enabled.

Job management

Job submission:

Allows you to submit common HPC job parameters, including the job name, user group, job execution duration, queue to which a user belongs, job execution command, job description, job scheduling time, number of job copies, environment variables, input and output redirection, account, log path redirection, and task execution path.
Allows you to set inter-job dependencies, job priorities, runtime resource restrictions, prehook/posthook, retry times upon job/task failures, and job/task timeout interval.
Allows you to specify compute node labels, resource requirements, and CPU and memory binding requirements when submitting jobs.
Supports common serial jobs, MPI jobs (including openmpi, hmpi, mpich, intelmp, and cosched jobs), array jobs, blocking jobs, and interactive jobs.
Allows you to submit a single job or jobs in batches, or submit jobs using scripts.

Job control:

Supports job or task recovery/batch recovery, suspension/batch suspension, termination/batch termination, restart/batch restart, and re-submission/batch re-submission.
Allows you to submit remarks such as reasons for control operations. This function can be used together with any control command.

Job query:

Allows you to query the brief information, detailed information, specified fields, and custom fields of a job or task, and query the detailed reason why a task is pending.
- Allows you to filter jobs by job ID, job name, user name, user group, queue, account, time, and status.
- Allows you to filters tasks by index, execution node, and status.
Allows you to query the brief and detailed information, specified fields, and custom fields of jobs and tasks at the same time.
Allows you to filter jobs and tasks by job ID, user name, user group, queue, account, time, execution node, and status.
Allows you to query a large number of jobs or tasks by page or query information about jobs or tasks with the specified page or quantity, as well as query help information. The output can be in long/wide format or JSON format.

Job scheduling:

Supports scheduling policies such as FIFO, Gang Scheduler, Fairshare, and preemption; and CPU-memory affinity scheduling.
Supports multi-dimensional resource scheduling, such as Kunpeng, Ascend, and x86 servers, CPU, memory, GPU, Ascend acceleration card, and custom resources.
Supports multiple resource pools, dynamically allocates resources such as CPUs, memory, GPUs, or nodes based on loads, and borrows resources across resource pools.
Selects candidate nodes randomly or based on the registration sequence, maximum available resources, or specified label.
Allows you to set the scheduling limit based on the maximum number of jobs that can be run in a cluster, the maximum number of scheduling times of a queue, account, or user, and the maximum number of pending jobs of a user.
Allows you to configure the brief and detailed pending reason for scheduling.
Allows you to enable or disable a scheduling phase, and configure the scheduling phases for the purpose of achieving high scheduling efficiency, ensuring fair resource allocation, or delivering the maximum number of jobs.

Job execution:

Supports startup of serial, parallel, MPI, array, blocking, and interactive jobs.
Allows you to specify the report period, collect task resource consumption information based on the cgroup, collect information such as the job running duration, CPU usage duration, average memory, and peak memory, and run daemon processes for jobs.
Allows you to set the memory limit based on the cgroup and supports jobs to inherit the system limit.
Allows you to configure the prehook/posthook that take effect globally, the number of retry times, execution timeout interval, and the blocklist of nodes where prehook failures occur.
Supports the prehook/posthook at the job level in MPI jobs and the prehook/posthook at the task level in non-MPI jobs.
Outputs basic job information and running data to the specified output file or the prehook phase, and reports the job result with standard error information.

Cluster management

Cluster resource information management:

Allows you to query the brief information and details of a queue, and customize the queue.
Allows you to query the brief information and details of a node, including:
Name, total CPUs, available CPUs, CPU topology, total memory, available memory, swap, tmp, average CPU load per minute, GPUs, node storage information, number of jobs on the node, labels, SDRs
Allows you to enable or disable a node, set the reason for enabling or disabling a node, define the memory usage threshold of a node, customize resources based on node configurations, add, delete, modify, and query node labels, remove nodes, and manually or automatically suspend or resume idle nodes.
Allows you to query job statistics, customize user configurations, make user synchronization configurations take effect in real time, dynamically assign job, node, and queue management permissions to users, view multi-dimensional statistics of the entire cluster, view brief and detailed account information, and customize account configurations.
Supports online configuration of account, resource pool, and resource allocation policies of resource pools.

Job data lifecycle management:

Allows you to configure the maximum storage duration of real-time data, historical data, and archived data.
Allows you to define storage locations and data types of data files.

Hyper MPI Feature List

Feature	Description
MPI_AllReduce	Supports the following algorithms: Algorithm 1: Recursive doubling Algorithm 2: Node-aware Recursive + Binomial (intra) Algorithm 3: Socket-aware Recursive + Binomial (intra) Algorithm 4: Ring, accelerating AllReduce large-packet operations Algorithm 5: Node-aware Recursive + K-nomial (intra) Algorithm 6: Socket-aware Recursive + K-nomial (intra) Algorithm 7: Node-aware K-nomial Algorithm 8: Socket-aware K-nomial Algorithm 11: Node-aware Parallel Algorithm 12: basic Rabenseifner algorithm Algorithm 13: Node-aware Rabenseifner Algorithm 14: Socket-aware Rabenseifner NOTE: Only algorithm 1 supports discontinuous data structures and noncommutative operations.
MPI_Barrier	Supports the following algorithms: Algorithm 1: Recursive doubling Algorithm 2: Node-aware Recursive + Binomial (intra) Algorithm 3: Socket-aware Recursive + Binomial (intra) Algorithm 4: Node-aware Recursive + K-nomial (intra) Algorithm 5: Socket-aware Recursive + K-nomial (intra) Algorithm 6: Node-aware K-nomial Algorithm 7: Socket-aware K-nomial Algorithm 10: Node-aware Parallel NOTE: Algorithms 3, 4, 5, 6, and 7 do not support PPN imbalance. Algorithms 6 and 10 do not support discontinuous ranks.
MPI_Bcast	The following algorithms are supported: Algorithm 2: Topo-aware Binomial tree Algorithm 3: Topo-aware K-nomial tree Algorithm 4: Topo-aware K-nomial tree + Binomial tree (intra) NOTE: The preceding algorithms support discontinuous data structures. Algorithms 3 and 4 do not support PPN imbalance and discontinuous ranks.

Feature

Description

MPI_AllReduce

Supports the following algorithms:

Algorithm 1: Recursive doubling
Algorithm 2: Node-aware Recursive + Binomial (intra)
Algorithm 3: Socket-aware Recursive + Binomial (intra)
Algorithm 4: Ring, accelerating AllReduce large-packet operations
Algorithm 5: Node-aware Recursive + K-nomial (intra)
Algorithm 6: Socket-aware Recursive + K-nomial (intra)
Algorithm 7: Node-aware K-nomial
Algorithm 8: Socket-aware K-nomial
Algorithm 11: Node-aware Parallel
Algorithm 12: basic Rabenseifner algorithm
Algorithm 13: Node-aware Rabenseifner
Algorithm 14: Socket-aware Rabenseifner

NOTE:

Only algorithm 1 supports discontinuous data structures and noncommutative operations.

MPI_Barrier

Supports the following algorithms:

Algorithm 1: Recursive doubling
Algorithm 2: Node-aware Recursive + Binomial (intra)
Algorithm 3: Socket-aware Recursive + Binomial (intra)
Algorithm 4: Node-aware Recursive + K-nomial (intra)
Algorithm 5: Socket-aware Recursive + K-nomial (intra)
Algorithm 6: Node-aware K-nomial
Algorithm 7: Socket-aware K-nomial
Algorithm 10: Node-aware Parallel
NOTE:
- Algorithms 3, 4, 5, 6, and 7 do not support PPN imbalance.
- Algorithms 6 and 10 do not support discontinuous ranks.

MPI_Bcast

The following algorithms are supported:

Algorithm 2: Topo-aware Binomial tree
Algorithm 3: Topo-aware K-nomial tree
Algorithm 4: Topo-aware K-nomial tree + Binomial tree (intra)

NOTE:

The preceding algorithms support discontinuous data structures.
Algorithms 3 and 4 do not support PPN imbalance and discontinuous ranks.