Tuning Guidelines
- The yahoo-streaming-benchmark test of Flink involves Kafka, Flink, and Redis. To test Flink, you need to adjust Kafka parameters. However, Flink is the test subject. Therefore, Kafka parameters do not need to be adjusted to the optimal state. You are advised to allocate most of CPU computing power to Flink.
- When the network, disk, and other resources of Flink do not reach the bottleneck, the throughput reaches the bottleneck. In addition, the throughput is proportional to the test data volume. You can test the maximum data volume to estimate the throughput bottleneck in the current environment, and then adjust the data volume to obtain good performance results.
- Main parameters for tuning:
- IP addresses and port numbers of the nodes where Kafka, Redis, and ZooKeeper are located.
- Topic: indicates the Kafka Topic name. Change the name each time you run the command.
- PARTITIONS: indicates the number of Kafka partitions. The number of partitions must be the same as that of Flink. Therefore, you need to modify this parameter for the Flink performance tuning.
- LOAD: indicates the data volume. The value of this parameter directly determines the throughput. You can adjust this value to measure the maximum throughput that can be reached in a certain delay.
- TEST_TIME: indicates the test time. A typical value is 240s.
- -p: indicates parallel level, which is a parameter attached to run Flink. The value must be smaller than or equal to that of PARTITIONS. Generally, the values of -p and PARTITIONS are the same.
After the configuration is complete, most of these parameters are written into the local_conf.yaml configuration file for Flink to read.
Parent topic: Introduction