iostat
Introduction
iostat is the most frequently used tool to investigate drive I/O problems. It summarizes all online drive statistics and provides indicators for load feature summary, usage, and saturation. It can be executed by any user, and the statistics come directly from the kernel. Therefore, the overhead of this tool can be ignored.
Installation Method
Generally, iostat is installed along with the system. If no, run the following command to install it (CentOS is used as an example):
1 | # yum -y install sysstat
|
How to Use
Command format: Command + parameter, for example:
1 | # iostat -d -k -x 1 100
|
Common parameters are as follows:
Parameter |
Description |
|---|---|
-c |
Displays CPU usage. |
-d |
Displays drive usage. |
-k |
Displays in the unit of KB. |
-m |
Displays in the unit of MB. |
-p |
Displays the usage of a single drive. |
-t |
Displays the timestamp. |
-x |
Displays detailed information. |
You can also add the statistical period and statistical duration at the end. In the preceding example, the statistical period is 1s, and the total statistical duration is 100s.
Output format:
1 2 3 4 5 | Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.02 7.25 0.04 1.90 0.74 35.47 37.15 0.04 19.13 5.58 1.09 dm-0 0.00 0.00 0.04 3.05 0.28 12.18 8.07 0.65 209.01 1.11 0.34 dm-1 0.00 0.00 0.02 5.82 0.46 23.26 8.13 0.43 74.33 1.30 0.76 dm-2 0.00 0.00 0.00 0.01 0.00 0.02 8.00 0.00 5.41 3.28 0.00 |
The parameters are described as follows:
Parameter |
Description |
|---|---|
rrqm/s |
Number of read operations merged into the request queue per second. |
wrqm/s |
Number of write operations merged into the request queue per second. |
r/s |
Number of read I/O operations performed by a drive per second. |
w/s |
Number of write I/O operations performed by a drive per second. |
rkB/s |
Number of KBs read from the drive per second. |
wkB/s |
Number of KBs written to the drive per second. |
avgrq-sz |
Average size of requested data, in sectors (512 bytes per sector). |
avgqu-sz |
Average I/O queue size (number of operation requests). |
await |
Average waiting time for the device I/O operations (ms). |
svctm |
Average response time for the device I/O operations (ms). |
%util |
Percentage of the I/O operation time, that is, the usage. |
Key parameters are described as follows:
- rrqm/s and wrqm/s indicate the numbers of read or write operations per second after merging. If the value is not 0 in the statistical period, the OS may detect logically adjacent or overlapping I/O requests and reduce the number of I/O operations delivered to disks through merging optimization. The merging usually means a continuous access mode at the logical layer, which helps improve the throughput of HDDs (reducing addressing) or reduce the write amplification of SSDs.
- If the value of %util is close to 100% (that is, the usage is close to 100%), it indicates that too many I/O requests are generated, the I/O system is fully loaded, the value of await increases, and the wait time percentage of the CPU increases (you can run the TOP command to check the wait time percentage). In this case, the drive becomes the bottleneck, which affects the entire system. In this case, you can replace the drive with a higher-performance drive or optimize the software to reduce the dependency on the drive.
- await needs to be used together with svctm. svctm is directly related to the drive performance and is the internal processing duration of the drive. The value of await depends on the sum of svctm and the length of the I/O queue. Generally, the value of svctm is less than that of await. If the value of svctm is close to that of await, the I/O operation has almost no waiting time (processing time is also counted as part of the waiting time). If the value of await is much greater than that of svctm, the drive I/O is abnormal. In this case, replace the drive with a faster one or optimize the application to resolve the problem.
- The queue size (avgqu-sz) can also be used as an indicator to measure the system I/O load. However, you need to check the queue size after a period of time because sometimes the queue size is only a peak value.