ClickHouse 元组
在 ClickHouse 中,提供 元组(Tuple) 数据类型用于将不同数据类型的 列 聚类存放,主要用于计算中途保存临时数据。[1]
DDL 中不建议使用元组
元组 存储与磁盘中并不会将内部列打散单独存储,因此如果只查询 元组 中某一列数据,ClickHouse 将从磁盘中读取所有 元组 包含的列,然后执行 tupleElement
函数 获取所需的列。
因此,只建议将一定同时使用的列创建成元组使用,否则会拖慢查询性能。
证明一
建表如下:
CREATE TABLE Xtuple (x Int64, tf Tuple(Float64, Float64), f1 Float64, f2 Float64)
ENGINE = MergeTree ORDER BY x;
insert into Xtuple
select number, tuple(number/4554334 as x1, number/121142 as x2), x1, x2 from numbers (100);
optimize table Xtuple final;
set allow_experimental_analyzer = 1;
查询 元组 其中一列:
EXPLAIN PLAN header=1 SELECT avg(tf.1) FROM Xtuple;
Expression ((Project names + Projection))
Header: avg(tupleElement(tf, 1)) Float64
Aggregating
Header: avg(tupleElement(tf_0, 1_UInt8)) Float64
Expression ((Before GROUP BY + Change column names to column identifiers))
Header: tupleElement(tf_0, 1_UInt8) Float64
ReadFromMergeTree (default.Xtuple)
Header: tf Tuple(Float64, Float64)
可以看到从磁盘中读取整个 元组 。
查询单独列:
EXPLAIN PLAN header=1 SELECT avg(f1)FROM Xtuple;
Expression ((Project names + Projection))
Header: avg(f1) Float64
Aggregating
Header: avg(f1_0) Float64
Expression ((Before GROUP BY + Change column names to column identifiers))
Header: f1_0 Float64
ReadFromMergeTree (default.Xtuple)
Header: f1 Float64
只读取所需的列。
证明二
查询耗时证明,这里贴上 Den Crane 的证明 [2]:
CREATE TABLE dw.Xtuple (x Int64, tf Tuple(Float64, Float64), f1 Float64, f2 Float64)
ENGINE = MergeTree ORDER BY x;
insert into Xtuple
select number, tuple(number/4554334 as x1, number/121142 as x2), x1, x2 from numbers (1000000000);
optimize table Xtuple final;
SELECT
avg(tf.1),
avg(tf.2)
FROM Xtuple;
┌─avg(tupleElement(tf, 1))─┬─avg(tupleElement(tf, 2))─┐
│ 109.7855360410545 │ 4127.387689653463 │
└──────────────────────────┴──────────────────────────┘
1 rows in set. Elapsed: 3.334 sec. Processed 1.00 billion rows, 16.00 GB (299.95 million rows/s., 4.80 GB/s.)
SELECT
avg(f1),
avg(f2)
FROM Xtuple;
┌───────────avg(f1)─┬───────────avg(f2)─┐
│ 109.7855360410545 │ 4127.387689653469 │
└───────────────────┴───────────────────┘
1 rows in set. Elapsed: 2.852 sec. Processed 1.00 billion rows, 16.00 GB (350.63 million rows/s., 5.61 GB/s.)
set max_threads=1;
SELECT
avg(tf.1),
avg(tf.2)
FROM Xtuple
┌─avg(tupleElement(tf, 1))─┬─avg(tupleElement(tf, 2))─┐
│ 109.78553604105454 │ 4127.387689653466 │
└──────────────────────────┴──────────────────────────┘
1 rows in set. Elapsed: 18.380 sec. Processed 1.00 billion rows, 16.00 GB (54.41 million rows/s., 870.52 MB/s.)
SELECT
avg(f1),
avg(f2)
FROM Xtuple
┌────────────avg(f1)─┬───────────avg(f2)─┐
│ 109.78553604105454 │ 4127.387689653466 │
└────────────────────┴───────────────────┘
1 rows in set. Elapsed: 18.750 sec. Processed 1.00 billion rows, 16.00 GB (53.33 million rows/s., 853.33 MB/s.)
SELECT avg(tf.1)
FROM Xtuple
┌─avg(tupleElement(tf, 1))─┐
│ 109.78553604105454 │
└──────────────────────────┘
1 rows in set. Elapsed: 17.679 sec. Processed 1.00 billion rows, 16.00 GB (56.57 million rows/s., 905.04 MB/s.)
SELECT avg(f1)
FROM Xtuple
┌────────────avg(f1)─┐
│ 109.78553604105454 │
└────────────────────┘
1 rows in set. Elapsed: 9.273 sec. Processed 1.00 billion rows, 8.00 GB (107.84 million rows/s., 862.73 MB/s.)