C/C++ Memory Usage Exceeds the Expectation

Fault Locating

Memory leak indicates that the heap memory that has been dynamically allocated in a program is not released or cannot be released due to certain reasons. As a result, the system memory is wasted, the program running speed slows down, or the system breaks down. The memory usage of the application mode and kernel mode keeps increasing. It is confirmed that the memory insufficiency is not caused by the virtual memory. The check whether memory leak occurs. Figure shows how to locate the fault.

Figure 1 Locating a C/C++ memory usage fault

Run the cat command to check the remaining memory of the system. If the total remaining memory keeps decreasing and cannot be restored, check whether memory leak occurs.

1	cat /proc/meminfo

MemTotal:       131084400 kB
MemFree:        18039732 kB
MemAvailable:   128007632 kB
Buffers:            3132 kB
Cached:         107111252 kB
SwapCached:            0 kB
Active:          3757036 kB
Inactive:       103812244 kB
Active(anon):     455888 kB
Inactive(anon):    21032 kB
Active(file):    3301148 kB
Inactive(file): 103791212 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        404036 kB
Mapped:           203740 kB
Shmem:             22024 kB
Slab:            4728332 kB
SReclaimable:    3858356 kB
SUnreclaim:       869976 kB
KernelStack:       17920 kB
PageTables:         9076 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    65542200 kB
Committed_AS:    3760688 kB
VmallocTotal:   135290290112 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:    149504 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

Parameter description:

MemFree: remaining memory
Slab: slab memory usage
SReclaimabl: reclaimable slab memory usage
SUnreclaim: unreclaimable slab memory usage

Run the top command and press M to sort the processes by memory usage. Check whether the non-virtual memory usage of the processes exceeds the expected value and determine the processes that occupy a large amount of memory.
- If yes, obtain the process IDs ($pid).
- If no, go to 5.
The default columns in the top command output are described as follows:
- VIRT (virtual memory usage): size of the virtual memory required by a process, including the libraries, code, and data used by the process. If the process applies for 100 MB memory but uses only 10 MB memory, the memory usage increases by 100 MB.
- RES (resident memory usage): size of the memory currently used by the process, excluding swap out. It contains the shared memory of other processes. If the process applies for 100 MB memory but uses only 10 MB memory, the memory usage increases by 10 MB, which is opposite to VIRT. In terms of memory usage of libraries, only memory occupied by loaded library files is counted.
- SHR (shared memory): shared memory of a process and the other processes. Although the process uses only functions of a few shared libraries, it contains the size of all the shared libraries. Calculate the size of the physical memory occupied by a process using the formula RES – SHR. After the swap-out, the shard memory value decreases.
- DATA: memory occupied by data. If it is not displayed in the top command output, press f to display it. The data space required by the program is actually used during the program running.
For details about top command parameters, see Top Commands.
Run the cat /proc/$pid/stat command to check whether the occupied physical memory exceeds the expected value.
Parameter description
- size: size of the virtual address space
- Resident: size of the physical memory that is being used
- Shared: number of shared pages
- Trs: size of the executable virtual memory of the program
- Lrs: size of the library for the virtual memory space mirrored to the task.
- Drs: size of the program data segment and user-mode stack
- dt: number of dirty pages (unit: pages)
Modify the code based on the code logic to ensure that the code for memory application and the code for memory release are paired.
- If the problem is resolved, integrate the modification into the code.
- If the problem persists, add the location information, and recompile and run the program.
If no heavy memory usage of processes is found in 2, the memory leak of the kernel space may be caused by operations on the kernel or application processes. You can run the cat /proc/slabinfo or slabtop command at a specified interval to check the memory usage. Locate the slab cache whose kernel resource consumption increases rapidly.
The parameters in the preceding figure are described as follows:
- SLABS: A slab is an entity memory area used to carry data. Each slab is a continuous memory area. A slab is the minimum unit managed by the slab allocator and consists of one or more pages. The preceding figure shows that there are 677900 slabs in total.
- OBJ/SLAB: number of objects contained in each slab. In the preceding figure, there are 39 objects.
- OBJS: OBJ is an element stored in a slab. In the preceding figure, there are 26438100 objects.
- OBJ SIZE: size occupied by each object. In the preceding figure, the value is 0.10 KB.
Therefore, CACHE SIZE = 26438100 *0.1K = 2711600K.
The parameters in the preceding figure are described as follows:
- name: name of a slab object.
- active_objs: number of active objects, that is, the number of objects that are being used.
- num_objs: total number of objects. The slab has the cache function. Therefore, the value of num_objs may be greater than that of active_objs.
- objsize: size of each object, in bytes.
- objperslab: A slab stores objects. This indicator indicates the number of objects in a slab.
- pagesperslab: tunables: number of memory pages occupied by a slab. The size of a slab is 3840 (320 x 12), which is less than 4096 (4 KB memory page as an example). Therefore, a slab occupies only one memory page. In addition, the cache object needs to store extra management information, which brings an overhead. Therefore, the size of an xfs_buf slab is greater than 3840.
- active_slabs: number of active slabs.
- num_slabs: total number of slabs.
Locate the problem, modify and recompile the code, and perform the verification.
- If the problem is resolved, integrate the modification into the code.
- If the problem persists, add the location information, and recompile and run the program.

You can also use the DevKit tool to locate memory leak problems. For details, see Creating a Task.

Case: Resource Leak Problem

Symptom

The software runs improperly on the server. The memory usage exceeds 80% when no pressure is loaded.

Fault Locating

The free command output shows that the remaining memory space is insufficient, but the process that causes the memory problem cannot be determined.
Run the top command and press M to sort processes by memory usage. No process that occupies a large amount of memory is found.
Run the cat /proc/meminfo command. It is found that the slab usage is too high and keeps increasing.
Run the cat /proc/slabinfo command. The command output shows that the usage of dentries in the slab is extremely high. Dentries are objects indicating directories and files in the memory. High-level dentries may indicate that a large number of files are opened.
The difference between the number of dentry objs on the server and that on an idle server is 383322, which occupies 70 GB more memory.
When a file is read or written, the kernel creates a dentry for the file object and caches the file object so that the file object can be directly obtained from the memory during the next read or write.

Generally, there are two solutions to high-level dentries:

Method 1:
Run the sudo sh -c "echo 2 > /proc/sys/vm/drop_caches" command to clear the cache.

The disadvantage of this method is that the command execution may be suspended for several minutes.

Method 2:
Adjust the kernel parameter vm.vfs_cache_pressure. For example, set vm.vfs_cache_pressure to 10000.

vm.vfs_cache_pressure controls the tendency of the kernel to reclaim the memory used for dentries and inode caches. The default value is 100. The kernel keeps the memory usage of dentries and inode caches at a relatively fair percentage based on the reclamation status of pagecache and swapcache. Decreasing the value of vfs_cache_pressure makes the kernel more likely to retain dentries and inode caches. If vfs_cache_pressure is 0, the kernel does not reclaim dentries or inode caches when the memory is insufficient, which may cause OOM. If vfs_cache_pressure is greater than 100, the kernel tends to reclaim dentries and inode caches.
Run the lsof -w| wc -l command to check the number of opened processes. It is found that a large number of Firefox processes are started, occupying a large number of memory resources.
According to the preceding information, a large number of Firefox processes are started in the service logic of the customer's code, causing excessive memory usage. After the code is modified and recompiled, the problem does not recur. After the modified code is incorporated, the problem is resolved.

Parent topic: Troubleshooting