Rate This Document
Findability
Accuracy
Completeness
Readability

Tuning Process Flow

  1. To maximize the disk performance, the total number of partitions must be greater than the number of disks. In this way, each disk has at least one partition. Otherwise, the disk may not be used.
  2. An important feature of Kafka is that all received messages are stored in disks instead of only in the memory. However, the slow disk read and write speeds affects the real-time performance of data. Therefore, Kafka uses the memory cache to store data and then uses the asynchronous threads to flush data to disks. This greatly improves the Kafka data persistence speed and ensures real-time performance. Kafka is primarily an I/O component. When the compression algorithm is not enabled, the CPU usage is low. However, if the compression algorithm is used, especially in consumption scenarios, the CPU usage is high (The compression algorithm features a high decompression speed, and therefore the consumption throughput can be greatly improved).
  3. Due to cluster deployment and multiple copies, Kafka data is transmitted over the network before being written to each node. Therefore, Kafka has high requirements on the network. To some extent, the network bandwidth is required based on the Kafka throughput. Although Kafka is a component that only stores data on disks, it does not depend on disks so much. Kafka performance indicators include latency and throughput. The latency is ensured by the system cache. During data production, Kafka only writes data to the cache, and the asynchronous threads synchronize the data from the cache to the hard disk. Therefore, Kafka requires a certain amount of total disk read and write bandwidths (which can be solved by adding the number of disks). However, Kafka does not have high requirements on the disk read and write speeds.
  4. During the production test, you can start multiple kafka-producer-perf-test.sh files on each client to produce messages of the same topic for 180 seconds. After 180 seconds, stop all producer processes.
  5. During the consumption test, you can start multiple kafka-consumer-perf-test.sh files on each client to consume the topics generated during the production test. All processes use the same group during consumption. The time required for message consumption is almost the same as the production time, that is, 180 seconds.