Using on ClickHouse
OmniMV has five application scenarios on ClickHouse. You can parse SQL statements to obtain recommended views. Read the usage example that matches your service requirements.
OmniMV can be used as follows on the ClickHouse engine in five scenarios:
Scenario 1: Obtain the SQL statement from the ClickHouse execution log and directly parse the SQL statement to obtain the projection. Parsing recommendations does not occupy resources on the ClickHouse server.
Configuration file in the distributed table scenario:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | database = ssb # Target database work_path = ./clickhouse # Path for storing working files such as OmniMV SQL files, view files, and model files ck_host = 0.0.0.0 # ClickHouse service IP address ck_port = 9000 # ClickHouse service port cluster_name = mycluster # Cluster name projection_wait_time = 3600 # Timeout duration of creating a projection is_distributed_table = True # Type of the ClickHouse table. False indicates that the table is a local table, and True indicates that the table is a distributed table. clickhouse_tables = lineorder_flat_all # Target table. Use commas (,) to separate multiple tables. The tables must be in the target database. query_log_table = system.query_log # Log table from which SQL statements are extracted sql_running_time_threshold = 100 # SQL execution time threshold, in milliseconds. SQL statements whose execution time is less than the threshold will not be processed. query_limit_each_table = 15 # Maximum number of SQL statements that can be analyzed in each target table projection_limit_each_table = 3 # Number of projections created for each table projection_with_conditional_functions = False # Indicates whether to recommend projections for aggregate functions that contain conditions. allows_materialize_projection = True # Indicates whether to materialize historical data of a new projection. |
Configuration file in the local table scenario:
1 2 3 4 5 6 7 8 9 10 11 12 13 | database = ssb # Target database work_path = ./clickhouse # Path for storing working files such as OmniMV SQL files, view files, and model files ck_host = 0.0.0.0 # ClickHouse service IP address ck_port = 9000 # ClickHouse service port projection_wait_time = 3600 # Timeout duration of creating a projection is_distributed_table = False # Type of the ClickHouse table. False indicates that the table is a local table, and True indicates that the table is a distributed table. clickhouse_tables = lineorder_flat # Target table. Use commas (,) to separate multiple tables. The tables must be in the target database. query_log_table = system.query_log # Log table from which SQL statements are extracted sql_running_time_threshold = 100 # SQL execution time threshold, in milliseconds. SQL statements whose execution time is less than the threshold will not be processed. query_limit_each_table = 15 # Maximum number of SQL statements that can be analyzed in each target table projection_limit_each_table = 3 # Number of projections created for each table projection_with_conditional_functions = False # Indicates whether to recommend projections for aggregate functions that contain conditions. allows_materialize_projection = True # Indicates whether to materialize historical data of a new projection. |
Procedure:
- Go to the OmniMV installation directory.
- Initialize the database information.
1python main.pyc clickhouse init_database
- Extract SQL statements that contain the target table query from query_log.
1python main.pyc clickhouse extract_sqls
- Parse the extracted SQL statements to obtain the recommended projection.
1python main.pyc clickhouse parse_sql_text
- Create a projection as the recommended one in the target table.
1python main.pyc clickhouse create_projections
Scenario 2: Obtain the SQL statement from the ClickHouse execution log and parse the SQL statement to obtain the projection. You need to connect to the database to query the compression ratio. The greedy algorithm is recommended. Parsing recommendations occupies a lot of resources on the ClickHouse server.
Configuration file in the distributed table scenario:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | database = ssb # Target database work_path = ./clickhouse # Path for storing working files such as OmniMV SQL files, view files, and model files ck_host = 0.0.0.0 # ClickHouse service IP address ck_port = 9000 # ClickHouse service port cluster_name = mycluster # Cluster name projection_wait_time = 3600 # Timeout duration of creating a projection projection_compress_ratio = 0.8 # Compression ratio threshold. Projects whose data compression ratio is greater than the threshold are not recommended. is_distributed_table = True # Type of the ClickHouse table. False indicates that the table is a local table, and True indicates that the table is a distributed table. clickhouse_tables = lineorder_flat_all # Target table. Use commas (,) to separate multiple tables. The tables must be in the target database. query_log_table = system.query_log # Log table from which SQL statements are extracted sql_running_time_threshold = 100 # SQL execution time threshold, in milliseconds. SQL statements whose execution time is less than the threshold will not be processed. query_limit_each_table = 15 # Maximum number of SQL statements that can be analyzed in each target table projection_limit_each_table = 3 # Number of projections created for each table projection_with_conditional_functions = False # Indicates whether to recommend projections for aggregate functions that contain conditions. allows_materialize_projection = True # Indicates whether to materialize historical data of a new projection. candidate_strategy = compress_ratio # Value compress_ratio indicates that data is sorted based on the projection compression ratio; SQL indicates that data is sorted based on the number of SQL statements that can be matched by projections. |
Configuration file in the local table scenario:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | database = ssb # Target database work_path = ./clickhouse # Path for storing working files such as OmniMV SQL files, view files, and model files ck_host = 0.0.0.0 # ClickHouse service IP address ck_port = 9000 # ClickHouse service port projection_wait_time = 3600 # Timeout duration of creating a projection projection_compress_ratio = 0.8 # Compression ratio threshold. Projects whose data compression ratio is greater than the threshold are not recommended. is_distributed_table = False # Type of the ClickHouse table. False indicates that the table is a local table, and True indicates that the table is a distributed table. clickhouse_tables = lineorder_flat # Target table. Use commas (,) to separate multiple tables. The tables must be in the target database. query_log_table = system.query_log # Log table from which SQL statements are extracted sql_running_time_threshold = 100 # SQL execution time threshold, in milliseconds. SQL statements whose execution time is less than the threshold will not be processed. query_limit_each_table = 15 # Maximum number of SQL statements that can be analyzed in each target table projection_limit_each_table = 3 # Number of projections created for each table projection_with_conditional_functions = False # Indicates whether to recommend projections for aggregate functions that contain conditions. allows_materialize_projection = True # Indicates whether to materialize historical data of a new projection. candidate_strategy = compress_ratio # Value compress_ratio indicates that data is sorted based on the projection compression ratio; SQL indicates that data is sorted based on the number of SQL statements that can be matched by projections. |
- Go to the OmniMV installation directory.
- Initialize the database information.
1python main.pyc clickhouse init_database
- Extract SQL statements that contain the target table query from query_log.
1python main.pyc clickhouse extract_sqls
- Parse the extracted SQL statements to obtain the recommended projection.
1python main.pyc clickhouse generate_views
- Create a projection as the recommended one in the target table.
1python main.pyc clickhouse create_projections
Scenario 3: Obtain the SQL statement from the ClickHouse execution log and parse the SQL statement to obtain the projection. The recommended algorithms use the AI model whose training occupies plenty of resources on the ClickHouse server.
Configuration file in the distributed table scenario:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | database = ssb work_path = ./clickhouse ck_host = 0.0.0.0 ck_port = 19000 cluster_name = mycluster num_round = 5 projection_wait_time = 3600 projection_compress_ratio = 0.8 sql_running_time_threshold = 100 projection_limit_each_table = 5 is_distributed_table = True clickhouse_tables = lineorder_flat_all query_log_table = system.query_log query_limit_each_table = 15 projection_with_conditional_functions = False allows_materialize_projection = True max_data_count = 20 epochs = 100 estimators = 50 max_depth = 5 learning_rate = 0.05 candidate_strategy = compress_ratio |
Configuration file in the local table scenario:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | database = ssb work_path = ./clickhouse ck_host = 0.0.0.0 ck_port = 19000 num_round = 5 projection_wait_time = 3600 projection_compress_ratio = 0.8 sql_running_time_threshold = 100 projection_limit_each_table = 5 is_distributed_table = False clickhouse_tables = lineorder_flat query_log_table = system.query_log query_limit_each_table = 15 projection_with_conditional_functions = False allows_materialize_projection = True max_data_count = 20 epochs = 100 estimators = 50 max_depth = 5 learning_rate = 0.05 candidate_strategy = compress_ratio |
- Go to the OmniMV installation directory.
- Initialize the database information.
1python main.pyc clickhouse init_database
- Extract SQL statements that contain the target table query from query_log.
1python main.pyc clickhouse extract_sqls
- Parse the extracted SQL statements to obtain the recommended projection.
1python main.pyc clickhouse generate_views
- Train the cost estimation model.
1python main.pyc clickhouse cost_estimation
- Train the recommendation model.
1python main.pyc clickhouse recommend
- Create a projection as the recommended one in the target table.
1python main.pyc clickhouse create_projections_model
Scenario 4: Obtain the new SQL statement from the ClickHouse execution log again, parse the SQL statement to obtain the Projection, and use the trained AI model to recommend projections. Parsing recommendations does not occupy resources on the ClickHouse server.
Configuration file in the distributed table scenario:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | database = ssb work_path = ./clickhouse ck_host = 0.0.0.0 ck_port = 19000 cluster_name = mycluster projection_wait_time = 3600 sql_running_time_threshold = 100 projection_limit_each_table = 5 is_distributed_table = True clickhouse_tables = lineorder_flat_all query_log_table = system.query_log query_limit_each_table = 15 projection_with_conditional_functions = False allows_materialize_projection = True candidate_strategy = compress_ratio |
Configuration file in the local table scenario:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | database = ssb work_path = ./clickhouse ck_host = 0.0.0.0 ck_port = 19000 projection_wait_time = 3600 sql_running_time_threshold = 100 projection_limit_each_table = 5 is_distributed_table = False clickhouse_tables = lineorder_flat query_log_table = system.query_log query_limit_each_table = 15 projection_with_conditional_functions = False allows_materialize_projection = True candidate_strategy = compress_ratio |
- Go to the OmniMV installation directory.
- Extract new SQL statements.
1python main.pyc clickhouse extract_sqls_incremental
- Parse the new SQL statements to obtain the projection view.
1python main.pyc clickhouse parse_sql_text_incremental
- Use the trained model for direct recommendation.
1python main.pyc clickhouse recommend
- Optional: Use new SQL statements to perform incremental training on the AI model. This will occupy a large number of resources on the ClickHouse server.
1python main.pyc clickhouse cost_estimation_incremental
Scenario 5: For SQL statements for multi-table query, materialized views are directly obtained after parsing. No recommendation policy is available.
Configuration file in the multi-table scenario:
1 2 3 4 5 6 7 | database = imdbload work_path = ./clickhouse ck_host = 0.0.0.0 ck_port = 19000 clickhouse_tables = all query_log_table = system.query_log query_limit_mutil_table = 15 |
- Go to the OmniMV installation directory.
- Initialize the database information.
1python main.pyc clickhouse init_database
- Obtain the multi-table query SQL statement and generate materialized views after parsing.
1python main.pyc clickhouse generate_materialize_view