25.0.0
Change Description
Scenario |
Change Description |
|---|---|
Common components and the framework |
Added support for openEuler 24.03 LTS SP1. |
Porting Advisor |
|
System Migration |
|
Affinity Analyzer |
|
Development Assistant |
Extended the compiler version range in dictionary management. The BiSheng Compiler version range is updated to 2.1.0 to 4.2.0 and the GCC version range is updated to 4.8.5 to 12.3.1. |
Compiler and Debugger |
|
System Profiler |
|
Java Profiler |
|
System Diagnosis |
|
Resolved Issues
Trouble Ticket No. |
DTS2024121703496 |
|---|---|
Symptom |
[CC] [Tuning with DevKit] After a Donau tenant runs the dsub command to submit a DevKit container job, the job is suspended and affects the information collection for the container job. |
Severity |
Minor |
Solution |
Add the --ipc parameter when using Singularity containers to execute MPI analysis tasks. |
Affected Domain |
DevKit MPI analysis tasks executed in Singularity containers. |
Possible Causes |
In a containerized environment, the Cross Memory Attach (CMA) mechanism is used in the Unified Communication X (UCX) context of Open MPI (especially HUCX and HMPI). Due to the CMA incompatibility issue, the internal communication within the hpctool node fails. (From the perspective of users, the application is started on one node. However, technically, the MPI sorting occurs in different temporary file systems in the container. That means that the MPI sorting occurs on different nodes.) |
Trouble Ticket No. |
DTS2024122618065 |
|---|---|
Symptom |
[Kunpeng] [DevKit] When a user submits a large-rank debugging process using the DonauKit in VS Code, the UI is suspended and no result is returned. |
Severity |
Minor |
Solution |
Run dsub -N 550 -nl [node_name] mpirun -x PATH -x LD_LIBRARY_PATH -mca btl ^vader,tcp,uct,openib -mca pml ucx to perform the debugging. |
Affected Domain |
Users submit a large-rank debugging process using the DonauKit in VS Code. |
Possible Causes |
If the rank size is too large, too many sockets are being connected simultaneously. Consequently, the connection requests are rejected. The idle sockets that have completed data sending and receiving need to be released to reduce the number of concurrent socket connections. |
Trouble Ticket No. |
DTS2025010211916 |
|---|---|
Symptom |
[System Diagnosis] [Memory usage] [Backend] The collection task has continued for three and a half hours when 120-second memory usage statistics of an HPC application are being collected using BPF Compiler Collection (BCC). The application path is /opt/test/usr/ompi/bin/mpirun, and the application parameter is --allow-run-as-root -np 2 -H xx.xx.xx.xx:28 -H xx.xx.xx.xx:28 -wdir /opt/test/lammps/RUN/airebo/ /opt/test/lammps/src/lmp_mpi -in in.tension -v model_name data. |
Severity |
Minor |
Solution |
Execute more than one process to collect data in multiple groups. |
Affected Domain |
Long-time memory diagnosis in the BCC environment. |
Possible Causes |
The C library is invoked for Python to implement the BCC function. If the data volume is large, the invoking takes a long time. |
Trouble Ticket No. |
DTS2024123106544 |
|---|---|
Symptom |
[Source code porting] Due to a defect in the preprocessing module, no message is displayed by default when the source code porting function is used in the first-layer macro definition, while a message is displayed when this function is used in the second-layer macro definition. |
Severity |
Minor |
Solution |
Add branch judgment. Use the x86 macro in the source code porting scenario. |
Affected Domain |
Source code porting and byte alignment. |
Possible Causes |
During preprocessing, the macro definition for the current compiler architecture is obtained from the system. The macro definition on x86 is different from that on Kunpeng. During processing, the __x86_64__ macro is used on x86 and the code in the __x86_64__ macro branch is added to the system macro list; however, the __x86_64__ macro definition cannot be obtained on the AArch64 platform because the macros in the __x86_64__ macro branch are not added to the AArch64 platform macro list. Therefore, no prompt is displayed when these macros are used. |
Trouble Ticket No. |
DTS2024112909019 |
|---|---|
Symptom |
[Database] [Static check] [MySQL] During a static check with the DevKit, it takes a long time to check a large file (more than 60 MB) and no result is generated even after 10 days. |
Severity |
Minor |
Solution |
Add the exit mechanism and maximum timeout interval. |
Affected Domain |
Static check of large files in the DevKit. |
Possible Causes |
It takes a long time to parse the dependency logic in the Static Value-Flow Analysis Framework (SVF). |
Trouble Ticket No. |
DTS2024112908199 |
|---|---|
Symptom |
[Database] [Static check] [MySQL] During a static check with the DevKit, the swap partition does not take effect, and an error message is displayed indicating that the memory space is insufficient. |
Severity |
Minor |
Solution |
Add the swapfree field resolution. |
Affected Domain |
Static check of swap partitions in the DevKit. |
Possible Causes |
Before the static check, the DevKit does not check swap partitions. |
Known Issues
Trouble Ticket No. |
DTS2025030427370 |
|---|---|
Symptom |
When debugging an HPC application that uses 2048 ranks, execute the MPI_INIT function and then the debugging command. The error message "All ranks has exited" is displayed even though the application running has not completed. |
Severity |
Minor |
Workaround |
When starting a debugging task, specify the -e parameter to add the environment variable export PMIX_MCA_gds=^ds21. |
Affected Domain |
The added environment variable is valid only for the current debugging task. It is transferred through the debugger parameter and becomes invalid after it is executed, incurring no impact on any other task. The tool usage does not affect the MPI program compilation and running or the running of existing tasks in the cluster. |
Progress |
Coordinate with other departments, including the HPC Development Dept, to secure test resources and complete the full test by April 11, 2025. |