鲲鹏社区首页
中文
注册
大数据OmniMV ClickHouse 性能测试指南

大数据OmniMV ClickHouse 性能测试指南

大数据ClickHouseOmniMV

发表于 2025/09/22

0

作者 | 朱勇

1 环境要求

软件环境要求:

软件

版本

获取方法

毕昇JDK

jdk8u342

获取链接

ZooKeeper

3.4.6

获取链接

ClickHouse

22.3.6.5

  • 服务端安装包

  • 客户端安装包

  • 公共安装包

节点规划(3个节点, 3个分片, 无副本):

主机名

IP地址

分片

agent1

192.168.xx.1

shard1

agent2

192.168.xx.2

shard2

agent3

192.168.xx.3

shard3

本文档基于agent1~3的IP地址编写,使用时注意IP地址的更换。

2 配置ClickHouse部署环境

1、依次登录节点。使用以下命令时需要将节点主机名分别修改为agent1、agent2、agent3。

hostnamectl set-hostname 主机名 --static

2、添加集群所有节点的映射关系。
1) 打开“/etc/hosts”文件。

vim /etc/hosts

2) 在hosts文件中添加以下内容。

IPaddress1 agent1

IPaddress2 agent2

IPaddress3 agent3

3) 按“Esc”键,输入:wq!,按“Enter”保存并退出编辑。
3、登录所有节点关闭防火墙。

systemctl stop firewalld.service

systemctl disable firewalld.service

4、登录所有节点配置SSH免密登录。
1) 生成密钥,遇到提示按“Enter”键。

ssh-keygen -t rsa

2) 在每台机器配置SSH免密登录(包括配置自身的免密)。

ssh-copy-id -i ~/.ssh/id_rsa.pub root@节点IP

5、安装JDK。
1) 获取毕昇JDK。

wget https://mirror.iscas.ac.cn/kunpeng/archive/compiler/bisheng_jdk/bisheng-jre-8u342-linux-aarch64.tar.gz

tar -zxf bisheng-jre-8u342-linux-aarch64.tar.gz -C /usr/local

cd /usr/local

ln -s bisheng-jdk1.8.0_342 jdk8us42

2) 配置环境变量。
打开“/etc/profile”文件。

vim /etc/profile

添加以下内容。

export JAVA_HOME=/usr/local/jdk8u342 export PATH=$JAVA_HOME/bin:$PATH

使环境变量生效。

source /etc/profile

3) 验证。

java -version

3 部署ZooKeeper

1、下载并解压ZooKeeper。
登录agent1节点,进入“/usr/local”目录,下载zookeeper-3.8.0.tar.gz,并解压。

cd /usr/local

wget http://archive.apache.org/dist/zookeeper/zookeeper-3.8.0/apachezookeeper-3.8.0-bin.tar.gz

tar -zxvf apache-zookeeper-3.8.0-bin.tar.gz

2、建立软链接,便于后期版本更换。

ln -s apache-zookeeper-3.8.0-bin zookeeper

3、添加ZooKeeper到环境变量。
1) 登录agent1节点,打开配置文件。

vim /etc/profile

2) 添加ZooKeeper到环境变量。

export ZOOKEEPER_HOME=/usr/local/zookeeper export PATH=$ZOOKEEPER_HOME/bin:$PATH

3) 使环境变量生效。

source /etc/profile

4) 在agent2,agent3节点重复执行以上步骤,配置ZooKeeper的环境变量。

4、修改Zookeeper配置文件。
1) 登录agent1节点,进入ZooKeeper所在目录。

cd /usr/local/zookeeper/conf

2) 拷贝配置文件。

cp zoo_sample.cfg zoo.cfg

3) 修改配置文件。

vim zoo.cfg

dataDir=/usr/local/zookeeper/tmp

4) 在最后添加如下代码,其中server.1-3是部署ZooKeeper的节点。

server.1=agent1:2888:3888 server.2=agent2:2888:3888 server.3=agent3:2888:3888

5) 创建“tmp”目录作数据目录。

mkdir /usr/local/zookeeper/tmp

6) 在“tmp”目录中创建一个空文件,并向该文件写入ID。

touch /usr/local/zookeeper/tmp/myid

echo 1 > /usr/local/zookeeper/tmp/myid

5、同步配置到其他节点。
1) 登录agent1节点,将配置好的ZooKeeper拷贝到其它节点。

scp -r /usr/local/zookeeper root@agent2:/usr/local

scp -r /usr/local/zookeeper root@agent3:/usr/local

2) 登录agent2、agent3,创建软链接并修改myid内容。
agent2:

cd /usr/local

echo 2 > /usr/local/zookeeper/tmp/myid

agent3:

cd /usr/local

echo 3 > /usr/local/zookeeper/tmp/myid

6、运行验证。
1) 分别在agent2,agent3上启动ZooKeeper。

cd /usr/local/zookeeper/bin

./zkServer.sh start

2) 查看ZooKeeper状态。

./zkServer.sh status

4 部署ClickHouse集群

1、获取ClickHouse相关安装包,在agent1~3节点都执行。

mkdir -p /usr/local/clickhouse

cd /usr/local/clickhouse

wget https://packages.clickhouse.com/tgz/lts/clickhouse-server-22.3.6.5arm64.tgz --no-check-certificate
wget https://packages.clickhouse.com/tgz/lts/clickhouse-client-22.3.6.5arm64.tgz --no-check-certificate
wget https://packages.clickhouse.com/tgz/lts/clickhouse-commonstatic-22.3.6.5-arm64.tgz --no-check-certificate

2、解压后执行安装。

tar -zxvf clickhouse-common-static-22.3.6.5-arm64.tg

sudo ./clickhouse-common-static-22.3.6.5/install/doinst.sh

tar -zxvf clickhouse-server-22.3.6.5-arm64.tgz

sudo ./clickhouse-server-22.3.6.5/install/doinst.sh

tar -zxvf clickhouse-client-22.3.6.5-arm64.tgz

sudo ./clickhouse-client-22.3.6.5/install/doinst.sh

3、查看盘挂载情况。

lsblk

回显信息显示如下,agent1~3都已经把盘挂载了“/data/data1-12”等12个数据目录,后续将利用这12个数据目录存储ClickHouse的数据。

 

4、修改“/etc/clickhouse-server/config.xml”配置文件(文件是只读文件,需要使用:wq!保存)。

需要配置以下信息:

storage_configuration

<storage_configuration>

<disks>

<bak_disk1> <!-- disk name -->

<path>/data/data1/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk1>

<bak_disk2>

<path>/data/data2/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk2>

<bak_disk3> <!-- disk name -->

<path>/data/data3/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk3>

<bak_disk4>

<path>/data/data4/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk4>

<bak_disk5> <!-- disk name -->

<path>/data/data5/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk5>

<bak_disk6>

<path>/data/data6/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk6>

<bak_disk7> <!-- disk name -->

<path>/data/data7/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk7>

<bak_disk8>

<path>/data/data8/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk8>

<bak_disk9> <!-- disk name -->

<path>/data/data9/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk9>

<bak_disk10> <!-- disk name -->

<path>/data/data10/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk10>

<bak_disk11> <!-- disk name -->

<path>/data/data11/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk11>

<bak_disk12>

<path>/data/data12/clickhouse/data/</path>

<keep_free_space_bytes>10485760</keep_free_space_bytes>

</bak_disk12>

</disks>

<policies>

<default>

<volumes>

<single>

<disk>bak_disk1</disk>

<disk>bak_disk2</disk>

<disk>bak_disk3</disk>

<disk>bak_disk4</disk>

<disk>bak_disk5</disk>

<disk>bak_disk6</disk>

<disk>bak_disk7</disk>

<disk>bak_disk8</disk>

<disk>bak_disk9</disk>

<disk>bak_disk10</disk>

<disk>bak_disk11</disk>

<disk>bak_disk12</disk>

</single>

</volumes>

</default>

</policies>

</storage_configuration>

remote server

<remote_servers>

<mycluster>

<shard>

<internal_replication>true</internal_replication>

<replica>

<host>192.168.xx.1</host>

<port>19000</port>

</replica>

</shard>

<shard>

<internal_replication>true</internal_replication>

<replica>

<host>192.168.xx.2</host>

<port>19000</port>

</replica>

</shard>

<shard>

<internal_replication>true</internal_replication>

<replica>

<host>192.168.xx.3</host>

<port>19000</port>

</replica>

</shard>

</mycluster>

</remote_servers>

ZooKeeper

<zookeeper>

<node index="1">

<host>192.168.xx.1</host>

<port>2181</port>

</node>

<node index="2">

<host>192.168.xx.2</host>

<port>2181</port>

</node>

<node index="3">

<host>192.168.xx.3</host>

<port>2181</port>

</node>

</zookeeper>

macros,macros信息需要根据每个节点的情况进行适配,和节点规划对应起来。

<macros>

<cluster>mycluster</cluster>

<shard>01</shard>

<replica>01</replica>

</macros>

在agent2和agent3<macros></macros>标签与上面不同,分别填写:

<macros>

<cluster>mycluster</cluster>

<shard>02</shard>

<replica>01</replica>

</macros>

<macros>

<cluster>mycluster</cluster>

<shard>03</shard>

<replica>01</replica> </macros>

5、创建存储元数据和表数据的目录。

mkdir -p /data/data{1,2,3,4,5,6,7,8,9,10,11,12}/clickhouse/data

mkdir /var/lib/clickhouse

“/data/data{1,2,3,4,5,6,7,8,9,10,11,12}/clickhouse/data”用于存储表数据,建表时指定了storage_policy。

“/var/lib/clickhouse”用于存储元数据,建表时不指定storage_policy时存储表数据。

如果提示缺少server.crt、server.key、dhparam.pem等文件,请使用config.xml中<openssl>标签内的两条生成命令生成相应文件。

openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout /etc/clickhouse-server/server.key -out /etc/clickhouse-server/server.crt openssl dhparam -out /etc/clickhouse-server/dhparam.pem 4096

如果8123端口冲突,则将<listen_host>只保留一个:<listen_host>127.0.0.1</listen_host>。

如果启动过程中发现9000端口有冲突,可以将9000端口的信息全部改成19000端口,换一个不冲突的端口(vim进入后,使用替换命令 :%s/9000/19000/g)。

6、集群验证。

1)在每个节点上启动ClickHouse Server节点。

cllickhouse start

2)进入ClickHouse Client节点。

clickhouse client -h 192.168.58.168 --port 19000 -n -m

如启动不成功或进入不了数据库,可通过以下命令启动。

/usr/bin/clickhouse-server --configfile /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid

3)启动后关闭远程连接窗口。重新连接服务器再尝试进入数据库。

4)查看cluster信息。

select * from system.clusters where cluster='mycluster';

5)建on cluster表。

CREATE TABLE customer on cluster mycluster

(

C_CUSTKEY UInt32,

C_NAME String,

C_ADDRESS String,

C_CITY LowCardinality(String),

C_NATION LowCardinality(String),

C_REGION LowCardinality(String),

C_PHONE String,

C_MKTSEGMENT LowCardinality(String)

)

ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/ default.customer','{replica}') ORDER BY (C_CUSTKEY);

6)建分布式表。

CREATE TABLE default.customer_all on cluster mycluster as default.cusomer

ENGINE = Distributed(mycluster, default, customer, rand());

7)查询分布式表,会利用所有的计算资源。

8)查询本地表,数据只会在同一个shard同步,然后计算资源也是只用到了本机的计算资源。

5 SSB数据集测试

5.1 准备测试数据

使用开源数据集star schema benchmark测试

1、下载编译dbgen工具,用于生成数据。建议进入挂载在数据盘上的目录执行,在系统盘生成数据比较占用存储空间。

cd / data/data1

yum -y install gcc gcc-c++ make cmake git

git clone https://github.com/vadimtk/ssb-dbgen.git
cd ssb-dbgen
make

2、生成数据。

./dbgen -s 500 -T c

./dbgen -s 500 -T l

./dbgen -s 500 -T p

./dbgen -s 500 -T s

./dbgen -s 500 -T d

3、在ClickHouse中创建数据表。

1)通过ClickHouse Client进入,创建一个database,例如:

clickhouse client -h 192.168.xx.1 --port 19000 -m -n server1 :) create database ssb_data_source;

use ssb_data_source;

2)执行以下建表语句:

CREATE TABLE customer

(

C_CUSTKEY UInt32,

C_NAME String,

C_ADDRESS String,

C_CITY LowCardinality(String),

C_NATION LowCardinality(String), C_REGION LowCardinality(String),

C_PHONE String,

C_MKTSEGMENT LowCardinality(String)

)

ENGINE = MergeTree ORDER BY (C_CUSTKEY) SETTINGS storage_policy = 'default';

CREATE TABLE part

(

P_PARTKEY UInt32,

P_NAME String,

P_MFGR LowCardinality(String),

P_CATEGORY LowCardinality(String),

P_BRAND LowCardinality(String), P_COLOR LowCardinality(String),

P_TYPE LowCardinality(String),

P_SIZE UInt8,

P_CONTAINER LowCardinality(String)

)

ENGINE = MergeTree ORDER BY P_PARTKEY SETTINGS storage_policy = 'default';

CREATE TABLE supplier

(

S_SUPPKEY UInt32,

S_NAME String,

S_ADDRESS String,

S_CITY LowCardinality(String),

S_NATION LowCardinality(String),

S_REGION LowCardinality(String),

S_PHONE String

)

ENGINE = MergeTree ORDER BY S_SUPPKEY

SETTINGS storage_policy = 'default';

4、在“ssb-dbgen”目录下导入数据。

clickhouse client -h 192.168.xx.1 --port 19000 -m -n -d ssb_data_source --query "INSERT INTO customer FORMAT CSV" < customer.tbl

clickhouse client -h 192.168.xx.1 --port 19000 -m -n -d ssb_data_source --query "INSERT INTO part FORMAT CSV" < part.tbl

clickhouse client -h 192.168.xx.1 --port 19000 -m -n -d ssb_data_source --query "INSERT INTO supplier FORMAT CSV" < supplier.tbl

clickhouse client -h 192.168.xx.1 --port 19000 -m -n -d ssb_data_source --query "INSERT INTO lineorder FORMAT CSV" < lineorder.tbl

5、将star schema转成成flat schema,宽表lineorder_flat表有30亿行数据。

在数据库中执行以下命令。

CREATE TABLE lineorder_flat

ENGINE = MergeTree ORDER BY (LO_ORDERDATE, LO_ORDERKEY)

SETTINGS storage_policy = 'default'

AS SELECT

l.LO_ORDERKEY AS LO_ORDERKEY,

l.LO_LINENUMBER AS LO_LINENUMBER,

l.LO_CUSTKEY AS LO_CUSTKEY,

l.LO_PARTKEY AS LO_PARTKEY,

l.LO_SUPPKEY AS LO_SUPPKEY,

l.LO_ORDERDATE AS LO_ORDERDATE,

l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,

l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,

l.LO_QUANTITY AS LO_QUANTITY,

l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,

l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,

l.LO_DISCOUNT AS LO_DISCOUNT,

l.LO_REVENUE AS LO_REVENUE,

l.LO_SUPPLYCOST AS LO_SUPPLYCOST,

l.LO_TAX AS LO_TAX,

l.LO_COMMITDATE AS LO_COMMITDATE,

l.LO_SHIPMODE AS LO_SHIPMODE,

c.C_NAME AS C_NAME,

c.C_ADDRESS AS C_ADDRESS,

c.C_CITY AS C_CITY,

c.C_NATION AS C_NATION,

c.C_REGION AS C_REGION,

c.C_PHONE AS C_PHONE,

c.C_MKTSEGMENT AS C_MKTSEGMENT,

s.S_NAME AS S_NAME,

s.S_ADDRESS AS S_ADDRESS,

s.S_CITY AS S_CITY,

s.S_NATION AS S_NATION,

s.S_REGION AS S_REGION,

s.S_PHONE AS S_PHONE,

p.P_NAME AS P_NAME,

p.P_MFGR AS P_MFGR,

p.P_CATEGORY AS P_CATEGORY,

p.P_BRAND AS P_BRAND,

p.P_COLOR AS P_COLOR,

p.P_TYPE AS P_TYPE,

p.P_SIZE AS P_SIZE,

p.P_CONTAINER AS P_CONTAINER

FROM lineorder AS l

INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY

INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY;

数据库ssb_data_source的lineorder_flat产生了30亿条数据作为数据源,在后续的测试步骤中,需要数据从此表导入即可。

5.2 分布式性能提升测试

5.2.1 构建分布式库表

1、进入ClickHouse交互式界面。

clickhouse client -h 192.168.xx.1 --port 19000 -m -n

如有密码则使用:

clickhouse client -h 192.168.xx.1 -u username --password password -m -n

2、查看集群名称。

select cluster,host_address,is_local from system.clusters;

cluster列为集群名称,之后步骤需要多次使用。is_local列为1代表当前所连接节点。

3、在所有节点创建数据库“distributed_test”。mycluster为集群名称,需要替换。

create database distributed_test on cluster mycluster;

4、在所有节点创建本地表“lineorder_flat”。

CREATE TABLE distributed_test.lineorder_flat on cluster mycluster (

`LO_ORDERKEY` UInt32,

`LO_LINENUMBER` UInt8,

`LO_CUSTKEY` UInt32,

`LO_PARTKEY` UInt32,

`LO_SUPPKEY` UInt32,

`LO_ORDERDATE` Date,

`LO_ORDERPRIORITY` LowCardinality(String),

`LO_SHIPPRIORITY` UInt8,

`LO_QUANTITY` UInt8,

`LO_EXTENDEDPRICE` UInt32, `LO_ORDTOTALPRICE` UInt32,

`LO_DISCOUNT` UInt8, `LO_REVENUE` UInt32,

`LO_SUPPLYCOST` UInt32,

`LO_TAX` UInt8,

`LO_COMMITDATE` Date,

`LO_SHIPMODE` LowCardinality(String),

`C_NAME` String,

`C_ADDRESS` String,

`C_CITY` LowCardinality(String),

`C_NATION` LowCardinality(String),

`C_REGION` LowCardinality(String),

`C_PHONE` String,

`C_MKTSEGMENT` LowCardinality(String),

`S_NAME` String,

`S_ADDRESS` String,

`S_CITY` LowCardinality(String),

`S_NATION` LowCardinality(String),

`S_REGION` LowCardinality(String),

`S_PHONE` String,

`P_NAME` String,

`P_MFGR` LowCardinality(String),

`P_CATEGORY` LowCardinality(String),

`P_BRAND` LowCardinality(String),

`P_COLOR` LowCardinality(String),

`P_TYPE` LowCardinality(String),

`P_SIZE` UInt8,

`P_CONTAINER` LowCardinality(String)

)

ENGINE = ReplicatedMergeTree('/clickhouse/tables/ finebiUserOper_cluster_23shards_2replicas/{shard}/ distributed.lineorder_flat__sagahgagag', '{replica}')

ORDER BY (LO_ORDERDATE, LO_ORDERKEY)

SETTINGS index_granularity = 8192, use_minimalistic_part_header_in_zookeeper =1, storage_policy = 'default';

5、在所有节点创建分布式表。

CREATE TABLE IF NOT EXISTS distributed_test.lineorder_flat_all ON CLUSTER mycluster AS distributed_test.lineorder_flat

ENGINE = Distributed(mycluster, distributed_test, lineorder_flat, rand());

6、导入数据:从步骤2创建的数据库里向lineorder_flat_all表并导入足量数据。

insert into distributed_test.lineorder_flat_all select * from ssb_data_source.lineorder_flat limit 3000000000;

本次测试导入30亿条数据。

5.2.2 执行OmniMV运行流程

1、清空system.query_log中distributed_test数据库的查询记录。

alter table system.query_log delete where databases=['distributed_test'];

2、执行ssb的sql。
官网上提供了13条SSB数据库的查询SQL。三条没有group by子句,10条有group by子句。需要将其改写为查询分布式表lineorder_flat_all,改写后SQL如下:

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat_all WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25 limit 100;

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat_all WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat_all WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;

SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat_all WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY year, P_BRAND ORDER BY year, P_BRAND;

SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat_all WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY year, P_BRAND ORDER BY year, P_BRAND;

SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat_all WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY year, P_BRAND ORDER BY year, P_BRAND;

SELECT C_NATION, S_NATION, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat_all WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 GROUP BY C_NATION, S_NATION,year ORDER BY year ASC,revenue DESC;

SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat_all WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 GROUP BY C_CITY, S_CITY, year ORDER BY year ASC,revenue DESC;

SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat_all WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 GROUP BY C_CITY, S_CITY, year ORDER BY year ASC, revenue DESC;

SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat_all WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712 GROUP BY C_CITY, S_CITY, year ORDER BY year ASC,revenue DESC;

SELECT toYear(LO_ORDERDATE) AS year, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat_all WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year, C_NATION ORDER BY year ASC, C_NATION ASC;

SELECT toYear(LO_ORDERDATE) AS year, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat_all WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year, S_NATION,P_CATEGORY ORDER BY year ASC, S_NATION ASC, P_CATEGORY ASC;

SELECT toYear(LO_ORDERDATE) AS year, S_CITY, P_BRAND, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat_all WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BY year, S_CITY, P_BRAND ORDER BY year ASC, S_CITY ASC, P_BRAND ASC;

3、配置OmniMV配置文件,参考配置如下。

[raw_file]

# 数据库名称

database = distributed_test work_path = ./clickhouse

[clickhouse]

ck_host = 192.168.58.168 ck_port = 19000

# 集群名称(只查询本地表可不配置) cluster_name = mycluster num_round = 5

# 是否启用clickhouse projection功能[True or False] enable_projection = False

# 创建和删除projection的超时时间,单位(s)

projection_wait_time = 3600

# 数据压缩比: 视图产生的数据行数/sql查询需要处理的行数read_rows,数据压缩比越小越好 projection_compress_ratio = 0.8

# SQL运行的阈值,小于这个时间的sql不考虑推荐视图,单位(ms) sql_running_time_threshold = 10

# 每张表推荐出来的projection数量的上限

projection_limit_each_table = 10

# 推荐Projection时配置表工作模式,为True,则【clickhouse_tables】只能配置分布式表,为False,则

【clickhouse_tables】只能配置本地表 is_distributed_table = True

# 列举一些重点的表来进行推荐,表之间用逗号分隔。在多表场景下,需要填写all, e.g. clickhouse_tables = all clickhouse_tables = lineorder_flat_all

# 指定从哪个系统表中查询query相关信息,一般本地表会从system.query_log中查询信息,分布式表可能从别的表中获取信息

query_log_table = system.query_log

# (初次提取query)从query_log中对每张表提取query信息,提取到的query需要是指定数据库,指定数据表,运行时间超过配置文件中阈值的query数量

# (增量提取query),增量提取的query数量也是该参数指定

query_limit_each_table = 15

# 在多表场景下,配置提取的query总数量

query_limit_mutil_table = 15

# 是否为包含条件的聚合函数sql推荐projection (例如select sum(multiif..)) [True or False]

# 如果为True:则去除条件表达式后推荐,为False:这条sql不推荐projection projection_with_conditional_functions = False

# 是否需要对新建的projection物化历史的数据,默认不物化历史数据 [True or False] allows_materialize_projection = True

# 指定要删除的Projection名称,使用 "table_name.projection_name"进行指定,table_name必须是本地表表名。多个projection之间使用(英文)逗号分割。

# 如果所有节点都存在此"table_name.projection_name",且都需要删除,需要将【is_distributed_table】配置为True

# 如果不需要删除Projection,请将其置为None,以防止误操作 drop_projections = None

[train]

# 通过对出现过的聚合列进行排列组合,以此来扩大训练数据的数量,这里可以限制 大的扩充的训练sql的数量 max_data_count = 20 epochs = 100 estimators = 50 max_depth = 5 learning_rate = 0.05

[recommend]

# recommend strategy

# options: ['sql', 'compress_ratio']

# 选择compress_ratio表示按照projection压缩比来排序,选择sql,表示按照projection可以匹配sql的数量来排序 candidate_strategy = compress_ratio

4、执行数据库初始化命令。

python main.pyc clickhouse init_database

5、执行提取SQL命令。

python main.pyc clickhouse extract_sqls

虽然配置了每张表提取15条SQL,但清空执行记录后只执行了13条SQL,因此只提取到了13条SQL。

 

6、执行解析projection命令。

python main.pyc clickhouse generate_views

产生6条projections。

7、创建projections。

python main.pyc clickhouse create_projections

在各个节点的本地表lineorder_flat上均创建了视图。

8、关闭projection,测试13条SQL运行时间。

1)将以下内容写入measure_performance.sh脚本中。在使用时,需要将脚本中的IP地址替换成实际的。

#!/bin/bash

# 获取要执行的SQL语句目录
sql_dir=$1 set_allow=$2
# 判断目录是否存在
if [ ! -d $sql_dir ]; then
echo "The specified directory does not exist." exit 1
fi
# 定义clickhouse命令行工具路径
clickhouse_client="/usr/bin/clickhouse-client -h 192.168.xx.1 --port 19000 -m -n -d distributed_test -allow_experimental_projection_optimization $set_allow"
# 获取目录下所有的.sql文件
sql_files=$(ls $sql_dir/*.sql)
# 循环执行每个SQL文件
for sql_file in $sql_files do
# 清理所有节点服务器缓存
ssh agent1 "echo 3 > /proc/sys/vm/drop_caches" ssh agent2 "echo 3 > /proc/sys/vm/drop_caches" ssh agent3 "echo 3 > /proc/sys/vm/drop_caches" echo "Executing SQL file: $sql_file"
start=$(date +%s.%N) # 记录开始时间 # 使用clickhouse命令执行SQL文件, 执行5次 for i in {1..5} do
echo "执行:$sql_file"
$clickhouse_client < $sql_file >> /dev/null done
end=$(date +%s.%N) # 记录结束时间
avg_time=`awk -v x=$end -v y=$start 'BEGIN{printf "%.4f\n",(x-y)/5}'` echo "$sql_file五次平均执行时间为:$avg_time 秒" done
echo "All SQL files executed."

2)执行脚本。

sh measure_performance.sh /tmp/pycharm_project_430/clickhouse 0

“/tmp/pycharm_project_430/clickhouse”参数为SQL文件夹路径,第二个参数表示是否启用projection,“0”表示禁用,“1”表示启用。SQL文件夹路径在配置文件中的work_path下的“/q/sql”中。
执行脚本输出为:

lineorder_flat_all_0.sql五次平均执行时间为:0.4203 秒

lineorder_flat_all_10.sql五次平均执行时间为:20.9677 秒

lineorder_flat_all_11.sql五次平均执行时间为:7.4551 秒

lineorder_flat_all_12.sql五次平均执行时间为:7.0748 秒

lineorder_flat_all_1.sql五次平均执行时间为:13.5604 秒

lineorder_flat_all_2.sql五次平均执行时间为:17.4962 秒

lineorder_flat_all_3.sql五次平均执行时间为:16.0823 秒

lineorder_flat_all_4.sql五次平均执行时间为:0.2616 秒

lineorder_flat_all_5.sql五次平均执行时间为:0.3689 秒

lineorder_flat_all_6.sql五次平均执行时间为:2.5517 秒

lineorder_flat_all_7.sql五次平均执行时间为:13.1876 秒

lineorder_flat_all_8.sql五次平均执行时间为:13.3727 秒

lineorder_flat_all_9.sql五次平均执行时间为:15.5257 秒

3)开启projection,测试13条SQL运行时间。

sh measure_performance.sh /tmp/pycharm_project_430/clickhouse 1

脚本输出为:

lineorder_flat_all_0.sql五次平均执行时间为:0.1874 秒

lineorder_flat_all_10.sql五次平均执行时间为:0.2092 秒

lineorder_flat_all_11.sql五次平均执行时间为:8.1524 秒

lineorder_flat_all_12.sql五次平均执行时间为:1.0646 秒

lineorder_flat_all_1.sql五次平均执行时间为:2.3531 秒

lineorder_flat_all_2.sql五次平均执行时间为:6.4758 秒

lineorder_flat_all_3.sql五次平均执行时间为:2.4112 秒

lineorder_flat_all_4.sql五次平均执行时间为:0.2446 秒

lineorder_flat_all_5.sql五次平均执行时间为:0.3683 秒

lineorder_flat_all_6.sql五次平均执行时间为:2.5895 秒

lineorder_flat_all_7.sql五次平均执行时间为:0.1433 秒

lineorder_flat_all_8.sql五次平均执行时间为:0.1555 秒

lineorder_flat_all_9.sql五次平均执行时间为:0.7408 秒

5.2.3 数据分析

SQL

无projection执行时间(秒)

启用projection执行时间(秒)

启用Projection,匹配视图号

性能提升

lineorder_flat_al l_0.sql

0.4203

0.1874

lineorder_flat_a ll_cube5

124.27

96%

lineorder_flat_al l_10.sql

20.9677

0.2092

lineorder_flat_a ll_cube2

9922.8

011%

lineorder_flat_al l_11.sql

7.4551

8.1524

-

-8.5533

%

lineorder_flat_al l_12.sql

7.0748

1.0646

lineorder_flat_a ll_cube0

564.55

01%

lineorder_flat_al l_1.sql

13.5604

2.3531

lineorder_flat_a ll_cube5

476.27

81%

lineorder_flat_al l_2.sql

17.4962

6.4758

lineorder_flat_a ll_cube5

170.17

82%

lineorder_flat_al l_3.sql

16.0823

2.4112

lineorder_flat_a ll_cube4

566.98

32%

lineorder_flat_al l_4.sql

0.2616

0.2446

-

6.9501

%

lineorder_flat_al l_5.sql

0.3689

0.3683

-

0.1629

%

lineorder_flat_al l_6.sql

2.5517

2.5895

-

-1.4597

%

lineorder_flat_al l_7.sql

13.1876

0.1433

lineorder_flat_a ll_cube3

9102.7

913%

lineorder_flat_al l_8.sql

13.3727

0.1555

lineorder_flat_a ll_cube3

8499.8

071%

lineorder_flat_al l_9.sql

15.5257

0.7408

lineorder_flat_a ll_cube3

1995.8

018%

总计

128.325

25.0957

-

411.34

26%

  • lineorder_flat_all_4.sql、lineorder_flat_all_5.sql、lineorder_flat_all_6.sql三条SQL没有group by子句,未推荐projection。

  • lineorder_flat_all_11.sql未查询超过一半的part,ClickHouse不为其匹配projection。

OmnicCache推荐的视图顺序:排在前面优先级高

["lineorder_flat_all_cube2.sql", "lineorder_flat_all_cube4.sql",

"lineorder_flat_all_cube0.sql", "lineorder_flat_all_cube3.sql",

"lineorder_flat_all_cube5.sql", "lineorder_flat_all_cube1.sql"]

视图额外存储空间:

视图号

server1存储空间(MiB)

agent1存储空间(MiB)

agent2存储空间(MiB)

总计

占比

lineorder_flat_ all_cube0

151.09

135.17

148.42

434.68

0.1

3%

lineorder_flat_ all_cube1

3921.92

3819.52

3911.68

11653.

12

3.4

2%

lineorder_flat_ all_cube2

32.76

28.47

31.95

93.18

0.0

3%

lineorder_flat_ all_cube3

229.84

206.72

226.25

662.81

0.1

9%

lineorder_flat_ all_cube4

33.04

28.75

32.22

94.01

0.0

3%

lineorder_flat_ all_cube5

1761.28

1638.4

1740.8

5140.4

8

1.5

1%

5.3 单机性能提升测试

1、将以下SQL语句(查询本地表lineorder_flat)分别写入.sql文件中,并放置于“/tmp/sqls”目录下。

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25 limit

100;--q1

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;--q2

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;--q3

SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY year, P_BRAND ORDER BY year,

P_BRAND;--q4

SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY year, P_BRAND ORDER BY year, P_BRAND;--q5

SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY year, P_BRAND ORDER BY year,

P_BRAND;--q6

SELECT C_NATION, S_NATION, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 GROUP BY C_NATION, S_NATION,year ORDER BY year ASC,revenue DESC;--q7

SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 GROUP BY C_CITY, S_CITY, year ORDER BY year ASC,revenue DESC;--q8

SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 GROUP BY C_CITY, S_CITY, year ORDER BY year ASC, revenue DESC;--q9

SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712 GROUP BY C_CITY, S_CITY, year ORDER BY year ASC,revenue DESC;--q10

SELECT toYear(LO_ORDERDATE) AS year, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR =

'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year, C_NATION ORDER BY year ASC, C_NATION ASC;--q11

SELECT toYear(LO_ORDERDATE) AS year, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BY year, S_NATION,P_CATEGORY ORDER BY year ASC, S_NATION ASC, P_CATEGORY ASC;--q12

SELECT toYear(LO_ORDERDATE) AS year, S_CITY, P_BRAND, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BY year, S_CITY, P_BRAND ORDER BY year ASC, S_CITY ASC, P_BRAND ASC;--q13

如下图所示:

 

因为分布式性能测试中的每个节点的distributed_test.lineorder_flat都有10亿行数据,且构建了projection。这里直接使用一台节点进行单机性能提升测试。

2、测试关闭projection的执行时间。

sh measure_performance.sh /tmp/sqls 0

脚本输出为:

/tmp/sqls/lineorder_flat_10.sql五次平均执行时间为:0.4130 秒

/tmp/sqls/lineorder_flat_11.sql五次平均执行时间为:21.5681 秒

/tmp/sqls/lineorder_flat_12.sql五次平均执行时间为:6.8846 秒

/tmp/sqls/lineorder_flat_13.sql五次平均执行时间为:7.2954 秒

/tmp/sqls/lineorder_flat_1.sql五次平均执行时间为:2.4973 秒

/tmp/sqls/lineorder_flat_2.sql五次平均执行时间为:0.3634 秒

/tmp/sqls/lineorder_flat_3.sql五次平均执行时间为:0.1863 秒

/tmp/sqls/lineorder_flat_4.sql五次平均执行时间为:15.3781 秒

/tmp/sqls/lineorder_flat_5.sql五次平均执行时间为:13.3549 秒

/tmp/sqls/lineorder_flat_6.sql五次平均执行时间为:13.7589 秒

/tmp/sqls/lineorder_flat_7.sql五次平均执行时间为:15.3541 秒

/tmp/sqls/lineorder_flat_8.sql五次平均执行时间为:16.7528 秒

/tmp/sqls/lineorder_flat_9.sql五次平均执行时间为:12.5798 秒

3、测试开启projection的执行时间。

sh measure_performance.sh /tmp/sqls 1

脚本输出为:

/tmp/sqls/lineorder_flat_10.sql五次平均执行时间为:0.1653 秒

/tmp/sqls/lineorder_flat_11.sql五次平均执行时间为:0.1974 秒

/tmp/sqls/lineorder_flat_12.sql五次平均执行时间为:0.7566 秒

/tmp/sqls/lineorder_flat_13.sql五次平均执行时间为:7.6826 秒

/tmp/sqls/lineorder_flat_1.sql五次平均执行时间为:2.4996 秒

/tmp/sqls/lineorder_flat_2.sql五次平均执行时间为:0.3270 秒

/tmp/sqls/lineorder_flat_3.sql五次平均执行时间为:0.1747 秒

/tmp/sqls/lineorder_flat_4.sql五次平均执行时间为:0.6892 秒

/tmp/sqls/lineorder_flat_5.sql五次平均执行时间为:0.1150 秒

/tmp/sqls/lineorder_flat_6.sql五次平均执行时间为:0.1025 秒

/tmp/sqls/lineorder_flat_7.sql五次平均执行时间为:1.1884 秒

/tmp/sqls/lineorder_flat_8.sql五次平均执行时间为:5.1712 秒

/tmp/sqls/lineorder_flat_9.sql五次平均执行时间为:1.2678 秒

数据分析:

SQL

无projection执行时间(秒)

启用projection执行时间(秒)

启用Projection,匹配视图号

性能提升

lineorder_flat_1.sql

2.4973

2.4996

-

-0.0920

%

lineorder_flat_2.sql

0.3634

0.327

-

11.131

5%

lineorder_flat_3.sql

0.1863

0.1747

-

6.6400

%

lineorder_flat_4.sql

15.3781

0.6892

lineorder_flat_a ll_cube3

2131.2

972%

lineorder_flat_5.sql

13.3549

0.115

lineorder_flat_a ll_cube3

11512. 9565%

lineorder_flat_6.sql

13.7589

0.1025

lineorder_flat_a ll_cube3

13323. 3171%

lineorder_flat_7.sql

15.3541

1.1884

lineorder_flat_a ll_cube4

1191.9

976%

lineorder_flat_8.sql

16.7528

5.1712

lineorder_flat_a ll_cube5

223.96

35%

lineorder_flat_9.sql

12.5798

1.2678

lineorder_flat_a ll_cube5

892.25

43%

lineorder_flat_10.sql

0.413

0.1653

lineorder_flat_a ll_cube5

149.84

88%

lineorder_flat_11.sql

21.5681

0.1974

lineorder_flat_a ll_cube2

10826. 0892%

lineorder_flat_12.sql

6.8846

0.7566

lineorder_flat_a ll_cube0

809.93

92%

lineorder_flat_13.sql

7.2954

7.6826

-

-5.0400

%

总计

126.3867

20.3373

-

521.45

27%

lineorder_flat_1.sql、lineorder_flat_2.sql、lineorder_flat_3.sql三条SQL没有group by子句,未推荐projection。lineorder_flat_13.sql未查询超过一半的part,ClickHouse不为其匹配projection。

6 CBG数据集测试

6.1 准备测试数据

请使用以下脚本生成30亿行数据,其中变量w控制生成行数,单位为万行。

#!/usr/bin/env python

# -*- coding: utf-8 -*

import random

import time

import csv

import io

from multiprocessing import Process

def writelog(fileName,count,num):

tuparr=[]

while count < num:

with io.open(fileName,mode='a+') as f:

dateTime = time.strftime('%Y-%m-%d',time.localtime(random.randint(1635523200, 1635609600)))

appNameNew = []

appNameNew.append('维修受理完成时间'+str(random.randint(0,100)))

appNameNew.append('激活日期'+str(random.randint(0,1000)))

appNameNew.append('上市月份'+str(random.randint(0,1000)))

appNameNew.append('设备品牌'+str(random.randint(0,100)))

appNameNew.append('产品线'+str(random.randint(0,100)))

appNameNew.append('产品线设备系列名称'+str(random.randint(0,100)))

appNameNew.append('产品线new'+str(random.randint(0,100)))

appNameNew.append('SPDT'+str(random.randint(0,100000)))

appNameNew.append('标准设备系列名称'+str(random.randint(0,1000)))

appNameNew.append(random.choice(['HUAWEI MATE 10','HUAWEI MATE 10 PRO','HUAWEI MATE 10 保时捷设计','HUAWEI MATE 20','HUAWEI MATE 20 PRO','HUAWEI MATE 20 PRO (UD','HUAWEI MATE 20 RS 保时捷设计','HUAWEI MATE 20 X','HUAWEI MATE 20 X (5G','HUAWEI MATE 30','HUAWEI MATE 30 5G','HUAWEI MATE 30 PRO','HUAWEI MATE 30 PRO 5G','HUAWEI MATE 30 RS 保时捷设计','HUAWEI MATE 40','HUAWEI MATE 40 PRO','HUAWEI MATE 40 PRO+','HUAWEI MATE 40 PRO 4G','HUAWEI MATE 40 RS 保时捷设计','HUAWEI MATE 40 RS 保时捷设计 典藏版','HUAWEI NOVA 4','HUAWEI NOVA 4E','HUAWEI NOVA 5','HUAWEI NOVA 5I','HUAWEI NOVA 5I PRO','HUAWEI NOVA 5 PRO','HUAWEI NOVA 5Z','HUAWEI NOVA 6','HUAWEI NOVA 6 (5G','HUAWEI NOVA 6 SE','HUAWEI NOVA 7 5G','HUAWEI NOVA 7 PRO 5G','HUAWEI NOVA 7 PRO 5G 活力版','HUAWEI NOVA 7 SE 5G','HUAWEI NOVA 7 SE 5G 乐活版','HUAWEI NOVA 7 SE 5G 活力版','HUAWEI NOVA 8','HUAWEI NOVA 8 PRO','HUAWEI NOVA 8 PRO 4G','HUAWEI NOVA 8 SE','HUAWEI NOVA 8 SE 活力版','HUAWEI NOVA 9','HUAWEI NOVA 9 PRO','HUAWEI P20','HUAWEI P20 PRO','HUAWEI P30','HUAWEI P30 PRO','HUAWEI P40 4G','HUAWEI P40 5G','HUAWEI P40 PRO','HUAWEI P40 PRO+','HUAWEI P50','HUAWEI P50 PRO 4G']))

appNameNew.append('设备EMUI版本'+str(random.randint(0,100)))

appNameNew.append('设备价格范围'+str(random.randint(0,100)))

appNameNew.append(random.choice(["32GB","64GB","128GB","256GB"]))

appNameNew.append(random.choice(["32GB","64GB","128GB","256GB"]))

appNameNew.append(random.choice(["红","绿","蓝","黑"]))

appNameNew.append('屏幕尺寸'+str(random.randint(0,10)))

appNameNew.append('分辨率'+str(random.randint(0,10)))

appNameNew.append('海外机型标志'+str(random.randint(0,100)))

appNameNew.append('老旧机型标志'+str(random.randint(0,100)))

appNameNew.append(random.choice(["男","女"]))

appNameNew.append('年龄范围'+str(random.randint(0,100)))

appNameNew.append('职业'+str(random.randint(0,1000)))

appNameNew.append('常驻省份'+str(random.randint(0,100)))

appNameNew.append('常驻城市'+str(random.randint(0,1000)))

appNameNew.append('常驻城市dengji'+str(random.randint(0,1000)))

appNameNew.append(random.choice(["1个月","2个月","3个月","4个月","5个月","6个月","7个月","8个月","9个月","10个月","11个月","12个月","13个月","14个月","15个月","16个月","17个月","18个月","19个月","20 个月","21个月","22个月","23个月","24个月","25个月","26个月","27个月","28个月","29个月","30个月","31个月","32个月","33个月","34个月","35个月","36个月","37个月","38个月","39个月","40个月","41个月","42个月","43个月","44个月","45个月","46个月","47个月","48个月"]))

appNameNew.append(random.choice(["1个月","2个月","3个月","4个月","5个月","6个月","7个月","8个月","9个月","10个月","11个月","12个月","13个月","14个月","15个月","16个月","17个月","18个月","19个月","20 个月","21个月","22个月","23个月","24个月","25个月","26个月","27个月","28个月","29个月","30个月","31个月","32个月","33个月","34个月","35个月","36个月","37个月","38个月","39个月","40个月","41个月","42个月","43个月","44个月","45个月","46个月","47个月","48个月"]))

appNameNew.append('剩余RAM分段'+str(random.randint(0,50)))

appNameNew.append('剩余ROM分段'+str(random.randint(0,50)))

appNameNew.append('卡顿程度'+str(random.randint(0,10)))

appNameNew.append('容量'+str(random.randint(100,120)))

appNameNew.append('衰减表只'+str(random.randint(0,3)))

appNameNew.append('老化标志'+str(random.randint(0,10)))

appNameNew.append('充电圈数区间'+str(random.randint(50,70)))

appNameNew.append('电池损耗区间'+str(random.randint(50,70)))

appNameNew.append('故障维修次数'+str(random.randint(0,10)))

appNameNew.append('故障维修方法'+str(random.randint(0,10)))

appNameNew.append('服务'+str(random.randint(1,10)))

appNameNew.append('插卡'+str(random.randint(1,10)))

appNameNew.append('开通'+str(random.randint(0,100)))

appNameNew.append(random.randint(0,1000))

appNameNew.append(random.randint(0,1000))

appNameNew.append(random.randint(0,1000))

appNameNew.append(random.randint(0,1000))

appNameNew.append(random.randint(0,1000))

appNameNew.append('年龄来源'+str(random.randint(0,100)))

appNameNew.append('性别来源'+str(random.randint(0,100)))

appNameNew.append('高可信度标志'+str(random.randint(0,100)))

appNameNew.append(dateTime)

tuparr.append(appNameNew)

#print(tuparr)

if(count%10000==0):

writeData = csv.writer(f)

writeData.writerows(tuparr)

tuparr=[]

count+=1

Processlist =[]

w = 300000

num = (2000*w) + 10

s = 0

t1 = Process(target=writelog, args=['dws1.txt', s + 1, s + num])

t2 = Process(target=writelog, args=['dws2.txt', s + 1, s + num])

t3 = Process(target=writelog, args=['dws3.txt', s + 1, s + num])

t4 = Process(target=writelog, args=['dws4.txt', s + 1, s + num])

t5 = Process(target=writelog, args=['dws5.txt', s + 1, s + num])

Processlist.append(t1)

Processlist.append(t2)

Processlist.append(t3)

Processlist.append(t4)

Processlist.append(t5)

if __name__ == '__main__':

for t in Processlist:

t.start()

6.2 分布式性能提升测试

6.2.1 构建分布式库表

1、在distributed_test中创建本地表dws_useroper_device_persona_mlb_ds。

CREATE TABLE distributed_test.dws_useroper_device_persona_mlb_ds on cluster mycluster

(

`sr_accept_date` Nullable(String) COMMENT '维修受理完成时间',

`activate_date` Nullable(String) COMMENT '激活日期',

`device_launch_month` Nullable(String) COMMENT '上市月份',

`device_brand` Nullable(String) COMMENT '设备品牌',

`product_line` Nullable(String) COMMENT '产品线',

`product_line_series_name` Nullable(String) COMMENT '产品线设备系列名称',

`new_product_line` Nullable(String) COMMENT '产品线(新)',

`new_product_line_series_name` Nullable(String) COMMENT 'SPDT(新)',

`std_series_name` Nullable(String) COMMENT '标准设备系列名称',

`device_prod_sprdname` Nullable(String) COMMENT '设备产品传播名',

`device_emui_ver` Nullable(String) COMMENT '设备EMUI版本',

`device_price_scope` Nullable(String) COMMENT '设备价格范围',

`rom` Nullable(String) COMMENT 'ROM(GB)',

`ram` Nullable(String) COMMENT 'RAM(GB)',

`device_color` Nullable(String) COMMENT '设备颜色',

`screen_size` Nullable(String) COMMENT '屏幕尺寸',

`resolution` Nullable(String) COMMENT '分辨率',

`ovsea_model_flg` Nullable(String) COMMENT '海外机型标志',

`old_model_flg` Nullable(String) COMMENT '老旧机型标志',

`forecast_gender` Nullable(String) COMMENT '性别',

`forecast_agerange` Nullable(String) COMMENT '年龄范围',

`career` Nullable(String) COMMENT '职业',

`residence_prov` Nullable(String) COMMENT '常驻省份',

`residence_city` Nullable(String) COMMENT '常驻城市',

`residence_city_level` Nullable(String) COMMENT '常驻城市等级',

`device_launch_duration` Nullable(String) COMMENT '设备上市时长(第几个月)',

`usage_duration_segment` Nullable(String) COMMENT '用机时长分段',

`remain_ram_segment` Nullable(String) COMMENT '剩余RAM分段',

`remain_rom_segment` Nullable(String) COMMENT '剩余ROM分段',

`stop_degree_flg` Nullable(String) COMMENT '卡顿程度',

`battery_volume` Nullable(String) COMMENT '电池设计容量(mAh)',

`battery_atnuatn_flg` Nullable(String) COMMENT '电池衰减标志',

`battery_aging_flg` Nullable(String) COMMENT '电池老化标志',

`charge_laps_range` Nullable(String) COMMENT '充电圈数区间',

`battery_loss_range` Nullable(String) COMMENT '电池损耗区间',

`fault_maint_cnt` Nullable(String) COMMENT '故障维修次数',

`maint_method` Nullable(String) COMMENT '故障类维修方法',

`service_solution_name` Nullable(String) COMMENT '服务交付方案名称',

`plug_card_flg` Nullable(String) COMMENT '插卡标志',

`open_user_experi_plan_flg` Nullable(String) COMMENT '开通用户体验计划标志',

`day_new_device_cnt` Nullable(Int64) COMMENT '日增设备数',

`day_active_device_cnt` Nullable(Int64) COMMENT '日活设备数',

`day7_active_device_cnt` Nullable(Int64) COMMENT '近7天活跃设备数',

`day30_active_device_cnt` Nullable(Int64) COMMENT '近30天活跃设备数',

`week_active_device_cnt` Nullable(Int64) COMMENT '当周活跃设备数',

`age_src` Nullable(String) COMMENT '年龄来源',

`gender_src` Nullable(String) COMMENT '性别来源',

`high_credible_flg` Nullable(String) COMMENT '高可信度标志',

`pt_d` Date DEFAULT today() COMMENT '天分区'

)

ENGINE = ReplicatedMergeTree('/clickhouse/tables/finebiUserOper_cluster_23shards_2replicas/{shard}/ UserOper_local.dws_useroper_device_persona_mlb_ds_001', '{replica}') PARTITION BY toYYYYMMDD(pt_d)

ORDER BY tuple()

SETTINGS index_granularity = 8192, use_minimalistic_part_header_in_zookeeper = 1, storage_policy = 'default';

2、在所有节点的distributed_test中创建分布式表dws_useroper_device_persona_mlb_ds_all。

CREATE TABLE IF NOT EXISTS

distributed_test.dws_useroper_device_persona_mlb_ds_all ON CLUSTER mycluster AS distributed_test.dws_useroper_device_persona_mlb_ds

ENGINE = Distributed(mycluster, distributed_test, dws_useroper_device_persona_mlb_ds, rand());

3、导入数据进入测试数据准备目录执行。

cat ./dws[1-5].txt| clickhouse --client -h 192.168.58.168 --port 19000 -format_csv_delimiter="," --query="INSERT INTO distributed_test.dws_useroper_device_persona_mlb_ds_all FORMAT CSV";

6.2.2 执行OmniMV运行流程

1、清空system.query_log中distributed_test数据库的查询记录。

alter table system.query_log delete where databases=['distributed_test'];

2、执行cbg的sql。
CBG提供了25条查询SQL。需要将其改写为查询分布式表。
例如dws_useroper_device_persona_mlb_ds_all,改写后SQL如下附件cbg_sqls_distributed_table.txt。

3、配置OmniMV配置文件参考配置如下。

[raw_file]

# 数据库名称

database = distributed_test work_path = ./clickhouse

[clickhouse]

ck_host = 192.168.58.168 ck_port = 19000

# 集群名称(只查询本地表可不配置)

cluster_name = mycluster num_round = 5

# 是否启用clickhouse projection功能[True or False] enable_projection = False

# 创建和删除projection的超时时间,单位(s)

projection_wait_time = 3600

# 数据压缩比: 视图产生的数据行数/sql查询需要处理的行数read_rows,数据压缩比越小越好 projection_compress_ratio = 0.8

# SQL运行的阈值,小于这个时间的sql不考虑推荐视图,单位(ms) sql_running_time_threshold = 1

# 每张表推荐出来的projection数量的上限

projection_limit_each_table = 10

# 推荐Projection时配置表工作模式,为True,则【clickhouse_tables】只能配置分布式表,为False,则

【clickhouse_tables】只能配置本地表 is_distributed_table = True

# 列举一些重点的表来进行推荐,表之间用逗号分隔。在多表场景下,需要填写all, e.g. clickhouse_tables = all

clickhouse_tables = dws_useroper_device_persona_mlb_ds_all

# 指定从哪个系统表中查询query相关信息,一般本地表会从system.query_log中查询信息,分布式表可能从别的表中获取信息

query_log_table = system.query_log

# (初次提取query)从query_log中对每张表提取query信息,提取到的query需要是指定数据库,指定数据表,运行时间超过配置文件中阈值的query数量

# (增量提取query),增量提取的query数量也是该参数指定 query_limit_each_table = 30

# 在多表场景下,配置提取的query总数量 query_limit_mutil_table = 15

# 是否为包含条件的聚合函数sql推荐projection (例如select sum(multiif..)) [True or False]

# 如果为True:则去除条件表达式后推荐,为False:这条sql不推荐projection projection_with_conditional_functions = False

# 是否需要对新建的projection物化历史的数据,默认不物化历史数据 [True or False] allows_materialize_projection = True

# 指定要删除的Projection名称,使用 "table_name.projection_name"进行指定,table_name只能是本地表。多个projection之间使用(英文)逗号分割。

# 如果所有节点都存在此"table_name.projection_name",且都需要删除,需要将【is_distributed_table】配置为True

# 如果不需要删除Projection,请将其置为None,以防止误操作 drop_projections = None

[train]

# 通过对出现过的聚合列进行排列组合,以此来扩大训练数据的数量,这里可以限制 大的扩充的训练sql的数量 max_data_count = 20 epochs = 100 estimators = 50 max_depth = 5 learning_rate = 0.05 [recommend]

# recommend strategy

# options: ['sql', 'compress_ratio']

# 选择compress_ratio表示按照projection压缩比来排序,选择sql,表示按照projection可以匹配sql的数量来排序 candidate_strategy = compress_ratio

4、执行数据库初始化命令。

python main.pyc clickhouse init_database

5、执行提取SQL命令。

python main.pyc clickhouse extract_sqls

虽然配置了每张表提取30条SQL,但清空执行记录后只执行了25条SQL,因此只提取到了25条SQL。

6、执行解析projection命令。

python main.pyc clickhouse generate_views

产生5条projections:

7、创建projections。

python main.pyc clickhouse create_projections

在各个节点的本地表lineorder_flat上均创建了视图。

8、关闭projection,测试25条SQL运行时间。

sh measure_performance.sh /tmp/pycharm_project_430/clickhouse 0

/“tmp/pycharm_project_430/clickhouse”参数为sql文件夹路径,第二个参数为是否启用projection,“0”表示禁用,“1”表示启用。sql文件夹路径在配置文件中的work_path下的“/q/sql”中。
脚本输出为:

dws_useroper_device_persona_mlb_ds_all_0.sql五次平均执行时间为:46.6810 秒

dws_useroper_device_persona_mlb_ds_all_10.sql五次平均执行时间为:0.7314 秒

dws_useroper_device_persona_mlb_ds_all_11.sql五次平均执行时间为:0.6443 秒

dws_useroper_device_persona_mlb_ds_all_12.sql五次平均执行时间为:0.6369 秒

dws_useroper_device_persona_mlb_ds_all_13.sql五次平均执行时间为:22.6676 秒

dws_useroper_device_persona_mlb_ds_all_14.sql五次平均执行时间为:0.6935 秒

dws_useroper_device_persona_mlb_ds_all_15.sql五次平均执行时间为:0.5505 秒

dws_useroper_device_persona_mlb_ds_all_16.sql五次平均执行时间为:30.9949 秒

dws_useroper_device_persona_mlb_ds_all_17.sql五次平均执行时间为:26.0273 秒

dws_useroper_device_persona_mlb_ds_all_18.sql五次平均执行时间为:28.7653 秒

dws_useroper_device_persona_mlb_ds_all_19.sql五次平均执行时间为:29.7989 秒

dws_useroper_device_persona_mlb_ds_all_1.sql五次平均执行时间为:30.6384 秒

dws_useroper_device_persona_mlb_ds_all_20.sql五次平均执行时间为:0.4865 秒

dws_useroper_device_persona_mlb_ds_all_21.sql五次平均执行时间为:0.3196 秒

dws_useroper_device_persona_mlb_ds_all_22.sql五次平均执行时间为:5.6267 秒

dws_useroper_device_persona_mlb_ds_all_23.sql五次平均执行时间为:0.0663 秒

dws_useroper_device_persona_mlb_ds_all_24.sql五次平均执行时间为:5.6848 秒

dws_useroper_device_persona_mlb_ds_all_2.sql五次平均执行时间为:45.7762 秒

dws_useroper_device_persona_mlb_ds_all_3.sql五次平均执行时间为:31.0191 秒

dws_useroper_device_persona_mlb_ds_all_4.sql五次平均执行时间为:0.6708 秒

dws_useroper_device_persona_mlb_ds_all_5.sql五次平均执行时间为:0.0796 秒

dws_useroper_device_persona_mlb_ds_all_6.sql五次平均执行时间为:0.0824 秒

dws_useroper_device_persona_mlb_ds_all_7.sql五次平均执行时间为:0.6179 秒

dws_useroper_device_persona_mlb_ds_all_8.sql五次平均执行时间为:0.6788 秒

dws_useroper_device_persona_mlb_ds_all_9.sql五次平均执行时间为:0.6545 秒

9、开启projection,测试25条SQL运行时间,

sh measure_performance.sh /tmp/pycharm_project_430/clickhouse 1

脚本输出为:

dws_useroper_device_persona_mlb_ds_all_0.sql五次平均执行时间为:0.2727 秒

dws_useroper_device_persona_mlb_ds_all_10.sql五次平均执行时间为:0.6687 秒

dws_useroper_device_persona_mlb_ds_all_11.sql五次平均执行时间为:0.6619 秒

dws_useroper_device_persona_mlb_ds_all_12.sql五次平均执行时间为:0.6504 秒

dws_useroper_device_persona_mlb_ds_all_13.sql五次平均执行时间为:22.5307 秒

dws_useroper_device_persona_mlb_ds_all_14.sql五次平均执行时间为:0.7521 秒

dws_useroper_device_persona_mlb_ds_all_15.sql五次平均执行时间为:0.5833 秒

dws_useroper_device_persona_mlb_ds_all_16.sql五次平均执行时间为:0.2670 秒

dws_useroper_device_persona_mlb_ds_all_17.sql五次平均执行时间为:0.2830 秒

dws_useroper_device_persona_mlb_ds_all_18.sql五次平均执行时间为:0.2477 秒

dws_useroper_device_persona_mlb_ds_all_19.sql五次平均执行时间为:0.2553 秒

dws_useroper_device_persona_mlb_ds_all_1.sql五次平均执行时间为:0.2320 秒

dws_useroper_device_persona_mlb_ds_all_20.sql五次平均执行时间为:0.5371 秒

dws_useroper_device_persona_mlb_ds_all_21.sql五次平均执行时间为:0.1113 秒

dws_useroper_device_persona_mlb_ds_all_22.sql五次平均执行时间为:0.1003 秒

dws_useroper_device_persona_mlb_ds_all_23.sql五次平均执行时间为:0.0716 秒

dws_useroper_device_persona_mlb_ds_all_24.sql五次平均执行时间为:0.0876 秒

dws_useroper_device_persona_mlb_ds_all_2.sql五次平均执行时间为:0.2383 秒

dws_useroper_device_persona_mlb_ds_all_3.sql五次平均执行时间为:0.2250 秒

dws_useroper_device_persona_mlb_ds_all_4.sql五次平均执行时间为:0.6925 秒

dws_useroper_device_persona_mlb_ds_all_5.sql五次平均执行时间为:0.0815 秒

dws_useroper_device_persona_mlb_ds_all_6.sql五次平均执行时间为:0.0865 秒

dws_useroper_device_persona_mlb_ds_all_7.sql五次平均执行时间为:0.7086 秒

dws_useroper_device_persona_mlb_ds_all_8.sql五次平均执行时间为:0.8154 秒

dws_useroper_device_persona_mlb_ds_all_9.sql五次平均执行时间为:0.7459 秒

6.2.3 数据分析

SQL

无projection执行时间(秒)

启用projection执行时间(秒)

匹配视图号

性能提升

dws_useroper_device_persona_mlb_ds_all_0

46.681

0.2727

dws_useroper_device_ persona_mlb_ds_all_c ube1

1701

8.07

85%

dws_useroper_device_persona_mlb_ds_a ll_10

0.7314

0.6687

9.37

64%

dws_useroper_device_persona_mlb_ds_a ll_11

0.6443

0.6619

-2.65

90%

dws_useroper_device_persona_mlb_ds_a ll_12

0.6369

0.6504

-2.07

56%

dws_useroper_device_persona_mlb_ds_a ll_13

22.6676

22.5307

0.60

76%

dws_useroper_device_persona_mlb_ds_a ll_14

0.6935

0.7521

-7.79

15%

dws_useroper_device_persona_mlb_ds_a ll_15

0.5505

0.5833

-5.62

32%

dws_useroper_device_persona_mlb_ds_a ll_16

30.9949

0.267

dws_useroper_device_ persona_mlb_ds_all_c ube2

1150

8.57

68%

dws_useroper_device_persona_mlb_ds_a ll_17

26.0273

0.283

dws_useroper_device_ persona_mlb_ds_all_c ube1

9096

.925

8%

dws_useroper_device_persona_mlb_ds_a ll_18

28.7653

0.2477

dws_useroper_device_ persona_mlb_ds_all_c ube0

1151

2.95

92%

dws_useroper_device_persona_mlb_ds_a ll_19

29.7989

0.2553

dws_useroper_device_ persona_mlb_ds_all_c ube4

1157

2.11

12%

dws_useroper_device_persona_mlb_ds_all_1

30.6384

0.232

dws_useroper_device_ persona_mlb_ds_all_c ube1

1310

6.20

69%

dws_useroper_device_persona_mlb_ds_a ll_20

0.4865

0.5371

-9.42

10%

dws_useroper_device_persona_mlb_ds_a ll_21

0.3196

0.1113

dws_useroper_device_ persona_mlb_ds_all_c ube0

187.

1518

%

dws_useroper_device_persona_mlb_ds_a ll_22

5.6267

0.1003

dws_useroper_device_ persona_mlb_ds_all_c ube1

5509

.870

4%

dws_useroper_device_persona_mlb_ds_a ll_23

0.0663

0.0716

-7.40

22%

dws_useroper_device_persona_mlb_ds_a ll_24

5.6848

0.0876

dws_useroper_device_ persona_mlb_ds_all_c ube1

6389

.497

7%

dws_useroper_device_persona_mlb_ds_all_2

45.7762

0.2383

dws_useroper_device_ persona_mlb_ds_all_c ube3

1910

9.48

38%

dws_useroper_device_persona_mlb_ds_all_3

31.0191

0.225

dws_useroper_device_ persona_mlb_ds_all_c ube3

1368

6.26

67%

dws_useroper_device_persona_mlb_ds_all_4

0.6708

0.6925

-3.13

36%

dws_useroper_device_persona_mlb_ds_all_5

0.0796

0.0815

-2.33

13%

dws_useroper_device_persona_mlb_ds_all_6

0.0824

0.0865

-4.73

99%

dws_useroper_device_persona_mlb_ds_all_7

0.6179

0.7086

-12.7

999

%

dws_useroper_device_persona_mlb_ds_all_8

0.6788

0.8154

-16.7

525

%

dws_useroper_device_persona_mlb_ds_all_9

0.6545

0.7459

-12.2

537

%

总计

310.5932

31.9064

873.

4511

%

6.3 单机性能提升测试

1、将以下SQL语句(查询本地表dws_useroper_device_persona_mlb_ds)分别输入.sql文件中,并放置于“/tmp/cbg_sqls”目录下。
如下图所示:

因为分布式性能测试中的每个节点的distributed_test.dws_useroper_device_persona_mlb_ds都有10亿行数据,且构建了projection。这里直接使用一台节点进行单机性能提升测试。

2、测试关闭projection的执行时间。

sh measure_performance.sh /tmp/cbg_sqls 0

脚本输出为:

dws_useroper_device_persona_mlb_ds_0.sql五次平均执行时间为:0.5102 秒

dws_useroper_device_persona_mlb_ds_10.sql五次平均执行时间为:0.4733 秒

dws_useroper_device_persona_mlb_ds_11.sql五次平均执行时间为:20.4022 秒

dws_useroper_device_persona_mlb_ds_12.sql五次平均执行时间为:0.4961 秒

dws_useroper_device_persona_mlb_ds_13.sql五次平均执行时间为:0.0726 秒

dws_useroper_device_persona_mlb_ds_14.sql五次平均执行时间为:1.8206 秒

dws_useroper_device_persona_mlb_ds_15.sql五次平均执行时间为:0.5058 秒

dws_useroper_device_persona_mlb_ds_16.sql五次平均执行时间为:1.7877 秒

dws_useroper_device_persona_mlb_ds_17.sql五次平均执行时间为:20.6862 秒

dws_useroper_device_persona_mlb_ds_18.sql五次平均执行时间为:0.2373 秒

dws_useroper_device_persona_mlb_ds_19.sql五次平均执行时间为:0.4287 秒

dws_useroper_device_persona_mlb_ds_1.sql五次平均执行时间为:20.7353 秒

dws_useroper_device_persona_mlb_ds_20.sql五次平均执行时间为:0.4021 秒

dws_useroper_device_persona_mlb_ds_21.sql五次平均执行时间为:17.8188 秒

dws_useroper_device_persona_mlb_ds_22.sql五次平均执行时间为:20.6722 秒

dws_useroper_device_persona_mlb_ds_23.sql五次平均执行时间为:0.5312 秒

dws_useroper_device_persona_mlb_ds_24.sql五次平均执行时间为:34.8628 秒

dws_useroper_device_persona_mlb_ds_2.sql五次平均执行时间为:0.5608 秒

dws_useroper_device_persona_mlb_ds_3.sql五次平均执行时间为:34.9922 秒

dws_useroper_device_persona_mlb_ds_4.sql五次平均执行时间为:0.4801 秒

dws_useroper_device_persona_mlb_ds_5.sql五次平均执行时间为:20.6449 秒

dws_useroper_device_persona_mlb_ds_6.sql五次平均执行时间为:0.0573 秒

dws_useroper_device_persona_mlb_ds_7.sql五次平均执行时间为:16.3198 秒

dws_useroper_device_persona_mlb_ds_8.sql五次平均执行时间为:0.5600 秒

dws_useroper_device_persona_mlb_ds_9.sql五次平均执行时间为:0.0723 秒

3、测试开启projection性能。

sh measure_performance.sh /tmp/cbg_sqls 1

脚本输出为:

dws_useroper_device_persona_mlb_ds_0.sql五次平均执行时间为:0.5162 秒

dws_useroper_device_persona_mlb_ds_10.sql五次平均执行时间为:0.4820 秒

dws_useroper_device_persona_mlb_ds_11.sql五次平均执行时间为:0.1564 秒

dws_useroper_device_persona_mlb_ds_12.sql五次平均执行时间为:0.4887 秒

dws_useroper_device_persona_mlb_ds_13.sql五次平均执行时间为:0.0701 秒

dws_useroper_device_persona_mlb_ds_14.sql五次平均执行时间为:0.0860 秒

dws_useroper_device_persona_mlb_ds_15.sql五次平均执行时间为:0.4759 秒

dws_useroper_device_persona_mlb_ds_16.sql五次平均执行时间为:0.0749 秒

dws_useroper_device_persona_mlb_ds_17.sql五次平均执行时间为:0.1364 秒

dws_useroper_device_persona_mlb_ds_18.sql五次平均执行时间为:0.0741 秒

dws_useroper_device_persona_mlb_ds_19.sql五次平均执行时间为:0.4211 秒

dws_useroper_device_persona_mlb_ds_1.sql五次平均执行时间为:0.1462 秒

dws_useroper_device_persona_mlb_ds_20.sql五次平均执行时间为:0.4020 秒

dws_useroper_device_persona_mlb_ds_21.sql五次平均执行时间为:0.1543 秒

dws_useroper_device_persona_mlb_ds_22.sql五次平均执行时间为:0.1579 秒

dws_useroper_device_persona_mlb_ds_23.sql五次平均执行时间为:0.5089 秒

dws_useroper_device_persona_mlb_ds_24.sql五次平均执行时间为:0.1411 秒

dws_useroper_device_persona_mlb_ds_2.sql五次平均执行时间为:0.5367 秒

dws_useroper_device_persona_mlb_ds_3.sql五次平均执行时间为:0.1512 秒

dws_useroper_device_persona_mlb_ds_4.sql五次平均执行时间为:0.4890 秒

dws_useroper_device_persona_mlb_ds_5.sql五次平均执行时间为:0.1532 秒

dws_useroper_device_persona_mlb_ds_6.sql五次平均执行时间为:0.0587 秒

dws_useroper_device_persona_mlb_ds_7.sql五次平均执行时间为:16.2735 秒

dws_useroper_device_persona_mlb_ds_8.sql五次平均执行时间为:0.5896 秒

dws_useroper_device_persona_mlb_ds_9.sql五次平均执行时间为:0.0739 秒

数据分析:

SQL

无projection执行时间(秒)

启用projection执行时间(秒)

匹配视图号

性能提升

dws_useroper_device_persona_mlb_ds_all_0

0.5102

0.5162

-1.16

23%

dws_useroper_device_persona_mlb_ds_a ll_10

0.4733

0.482

-1.80

50%

dws_useroper_device_persona_mlb_ds_a ll_11

20.4022

0.1564

dws_useroper_device_ persona_mlb_ds_all_c ube2

1294

4.88

49%

dws_useroper_device_persona_mlb_ds_a ll_12

0.4961

0.4887

1.51

42%

dws_useroper_device_persona_mlb_ds_a ll_13

0.0726

0.0701

3.56

63%

dws_useroper_device_persona_mlb_ds_a ll_cube1

1.8206

0.086

dws_useroper_device_ persona_mlb_ds_all_c ube1

2016

.976

7%

dws_useroper_device_persona_mlb_ds_a ll_15

0.5058

0.4759

6.28

28%

dws_useroper_device_persona_mlb_ds_a ll_16

1.7877

0.0749

dws_useroper_device_ persona_mlb_ds_all_c ube1

2286

.782

4%

dws_useroper_device_persona_mlb_ds_a ll_17

20.6862

0.1364

dws_useroper_device_ persona_mlb_ds_all_c ube1

1506

5.83

58%

dws_useroper_device_persona_mlb_ds_a ll_18

0.2373

0.0741

dws_useroper_device_ persona_mlb_ds_all_c ube0

220.

2429

%

dws_useroper_device_persona_mlb_ds_a ll_19

0.4287

0.4211

1.80

48%

dws_useroper_device_persona_mlb_ds_all_1

20.7353

0.1462

dws_useroper_device_ persona_mlb_ds_all_c ube3

1408

2.83

17%

dws_useroper_device_persona_mlb_ds_a ll_20

0.4021

0.402

0.02

49%

dws_useroper_device_persona_mlb_ds_a ll_21

17.8188

0.1543

dws_useroper_device_ persona_mlb_ds_all_c ube1

1144

8.15

29%

dws_useroper_device_persona_mlb_ds_a ll_22

20.6722

0.1579

dws_useroper_device_ persona_mlb_ds_all_c ube4

1299

1.95

69%

dws_useroper_device_persona_mlb_ds_a ll_23

0.5312

0.5089

4.38

20%

dws_useroper_device_persona_mlb_ds_a ll_24

34.8628

0.1411

dws_useroper_device_ persona_mlb_ds_all_c ube3

2460

7.86

68%

dws_useroper_device_persona_mlb_ds_all_2

0.5608

0.5367

4.49

04%

dws_useroper_device_persona_mlb_ds_all_3

34.9922

0.1512

dws_useroper_device_ persona_mlb_ds_all_c ube1

2304

2.98

94%

dws_useroper_device_persona_mlb_ds_all_4

0.4801

0.489

-1.82

00%

dws_useroper_device_persona_mlb_ds_all_5

20.6449

0.1532

dws_useroper_device_ persona_mlb_ds_all_c ube0

1337

5.78

33%

dws_useroper_device_persona_mlb_ds_all_6

0.0573

0.0587

-2.38

50%

dws_useroper_device_persona_mlb_ds_all_7

16.3198

16.2735

0.28

45%

dws_useroper_device_persona_mlb_ds_all_8

0.56

0.5896

-5.02

04%

dws_useroper_device_persona_mlb_ds_all_9

0.0723

0.0739

-2.16

51%

总计

216.1305

22.818

847.

1930

%

本页内容