Adjusting the Dirty Data Refresh Policy to Reduce the Drive I/O Pressure
Principles
The data that needs to be written back to drives in the page cache is dirty data. When an application instructs the system to save dirty data, the application can directly write the data to a drive (O_DIRECT mode) or write the data to the page cache (non-O_DIRECT mode). In non-O_DIRECT mode, operations on data cached in the page cache are performed in the memory, reducing operations on drives.
Modification Method
The system provides the following parameters to adjust the policy:
- /proc/sys/vm/dirty_expire_centiseconds: This parameter specifies the duration for storing dirty data in the cache, that is, when the duration expires, the dirty data needs to be written to drives. The default value of this parameter is 30s (3000 x 0.01s). If service data is written continuously, set this parameter to a smaller value to prevent burst I/O waiting caused by centralized I/Os. You can run the echo command to change the value.
1# echo 2000 > /proc/sys/vm/dirty_expire_centisecs - /proc/sys/vm/dirty_background_ratio: This parameter specifies the maximum percentage of dirty pages to the total memory before the dirty pages are written to drive by the pdflush process (based on memfree + Cached - Mapped). Increasing the value of this parameter will allocate more memory for the write buffer, thereby improving the drive write performance. However, for write-intensive services, set this parameter to a smaller value to prevent data from being stacked and causing performance bottlenecks. You can identify the bottleneck by observing the time fluctuation range of await and based on service characteristics. The default value is 10. You can run the echo command to change the value.
1echo 8 > /proc/sys/vm/dirty_background_ratio
- /proc/sys/vm/dirty_ratio: This parameter specifies the maximum ratio of dirty pages to the total memory. If the ratio exceeds the value, the system does not add dirty pages and the file read and write operations change to the synchronous mode. After the file read and write operations change to the synchronous mode, the block time of the file read and write operations of the application becomes longer, which slows down the system. The default value of this parameter is 40. For write-intensive services, you can increase this parameter to prevent the drive from entering the synchronous write state too early.
If the cache size and time of dirty data are increased, the probability of data loss increases in case of unexpected power failures. Therefore, for data that needs to be stored to drives immediately, adopt the O_DIRECT mode to prevent key data loss.
Parent topic: Optimization Methods