Rate This Document
Findability
Accuracy
Completeness
Readability

Modifying Flink Task Parameters

Purpose

When the yahoo-streaming-benchmark tool is used for the test, the Flink configuration in the test script needs to be modified. The Flink configuration is stored in stream-bench-hdp.sh. You need to adjust the parameters based on the site requirements.

Procedure

Search for the following parameters in the stream-bench-hdp.sh file and modify them:

Parameter

Recommended Value

Description

Partitions

Two to three times the number of disks on the Kafka service node

The value is the number of Kafka Partitions. When changing the value, you must change the two concurrency values when the Flink task is started. This is because the Flink task randomly generates data and writes the data to Kafka. Therefore, the Flink concurrency must be less than or equal to the number of Partitions of Kafka. Otherwise, data cannot be written. In the latter scenario, Flink consumes Kafka data. Although the concurrency is not limited by the number of Partitions, you are advised to keep the concurrency of two Flink tasks consistent for testing. When Flink submits tasks to Yarn, slots, memory, and TaskManagers have been allocated. Therefore, the resources that can be used by Flink are limited. The concurrency parameter after -p indicates the number of occupied TaskManagers. This case requires two processes to run at the same time. Therefore, the value of Partitions cannot exceed half of the total number of slots. That is, Partitions <= -n (TaskManagers) x -s (slots) / 2, concurrency has a great impact on the Flink performance, and there is no clear regularity. Generally, as the number of Partitions increases, the latency decreases first, then remains steady, and finally increases.

Load

Adjust the value based on the actual scenarios.

This parameter indicates the data Load, which uniquely controls the throughput per unit time. When TEST_TIME is fixed, setting this parameter is to set the throughput of the test case (if the Flink engine is sufficient for processing). Therefore, there are two methods to observe the performance data of the Flink test.

  • Fix the Load and check the latency.
  • Fix the latency, and check the maximum Load. Currently, this method is commonly used, that is, adjusting the Load.

TOPIC

ad-event{$partitions}

This parameter indicates the Kafka topic name. The topic name varies depending on the number of partitions. Generally, the topic name is in the ad-event{$partitions} format to avoid repetition.

TEST_TIME

240

This parameter indicates the test time. Generally, the value is 240s.