Search CTRL + K

ClickHouse 元组

在 ClickHouse 中,提供 元组(Tuple) 数据类型用于将不同数据类型的 聚类存放,主要用于计算中途保存临时数据。[1]

DDL 中不建议使用元组

元组 存储与磁盘中并不会将内部列打散单独存储,因此如果只查询 元组 中某一列数据,ClickHouse 将从磁盘中读取所有 元组 包含的列,然后执行 tupleElement 函数 获取所需的列。

因此,只建议将一定同时使用的列创建成元组使用,否则会拖慢查询性能。

证明一

建表如下:

CREATE TABLE Xtuple (x Int64, tf Tuple(Float64, Float64), f1 Float64, f2 Float64) 
ENGINE = MergeTree ORDER BY x;

insert into Xtuple 
select number, tuple(number/4554334 as x1, number/121142 as x2), x1, x2 from numbers (100);

optimize table Xtuple final;

set allow_experimental_analyzer = 1;

查询 元组 其中一列:

EXPLAIN PLAN header=1 SELECT avg(tf.1) FROM Xtuple;

Expression ((Project names + Projection))
Header: avg(tupleElement(tf, 1)) Float64
  Aggregating
  Header: avg(tupleElement(tf_0, 1_UInt8)) Float64
    Expression ((Before GROUP BY + Change column names to column identifiers))
    Header: tupleElement(tf_0, 1_UInt8) Float64
      ReadFromMergeTree (default.Xtuple)
      Header: tf Tuple(Float64, Float64)

可以看到从磁盘中读取整个 元组

查询单独列:

EXPLAIN PLAN header=1 SELECT avg(f1)FROM Xtuple;

Expression ((Project names + Projection))
Header: avg(f1) Float64
  Aggregating
  Header: avg(f1_0) Float64
    Expression ((Before GROUP BY + Change column names to column identifiers))
    Header: f1_0 Float64
      ReadFromMergeTree (default.Xtuple)
      Header: f1 Float64

只读取所需的列。

证明二

查询耗时证明,这里贴上 Den Crane 的证明 [2]

CREATE TABLE dw.Xtuple (x Int64, tf Tuple(Float64, Float64), f1 Float64, f2 Float64) 
ENGINE = MergeTree ORDER BY x;

insert into Xtuple 
select number, tuple(number/4554334 as x1, number/121142 as x2), x1, x2 from numbers (1000000000);

optimize table Xtuple final;


SELECT
    avg(tf.1),
    avg(tf.2)
FROM Xtuple;

┌─avg(tupleElement(tf, 1))─┬─avg(tupleElement(tf, 2))─┐
│        109.7855360410545 │        4127.387689653463 │
└──────────────────────────┴──────────────────────────┘
1 rows in set. Elapsed: 3.334 sec. Processed 1.00 billion rows, 16.00 GB (299.95 million rows/s., 4.80 GB/s.)


SELECT
    avg(f1),
    avg(f2)
FROM Xtuple;

┌───────────avg(f1)─┬───────────avg(f2)─┐
│ 109.7855360410545 │ 4127.387689653469 │
└───────────────────┴───────────────────┘
1 rows in set. Elapsed: 2.852 sec. Processed 1.00 billion rows, 16.00 GB (350.63 million rows/s., 5.61 GB/s.)



set max_threads=1;

SELECT
    avg(tf.1),
    avg(tf.2)
FROM Xtuple
┌─avg(tupleElement(tf, 1))─┬─avg(tupleElement(tf, 2))─┐
│       109.78553604105454 │        4127.387689653466 │
└──────────────────────────┴──────────────────────────┘
1 rows in set. Elapsed: 18.380 sec. Processed 1.00 billion rows, 16.00 GB (54.41 million rows/s., 870.52 MB/s.)

SELECT
    avg(f1),
    avg(f2)
FROM Xtuple
┌────────────avg(f1)─┬───────────avg(f2)─┐
│ 109.78553604105454 │ 4127.387689653466 │
└────────────────────┴───────────────────┘
1 rows in set. Elapsed: 18.750 sec. Processed 1.00 billion rows, 16.00 GB (53.33 million rows/s., 853.33 MB/s.)



SELECT avg(tf.1)
FROM Xtuple
┌─avg(tupleElement(tf, 1))─┐
│       109.78553604105454 │
└──────────────────────────┘
1 rows in set. Elapsed: 17.679 sec. Processed 1.00 billion rows, 16.00 GB (56.57 million rows/s., 905.04 MB/s.)

SELECT avg(f1)
FROM Xtuple
┌────────────avg(f1)─┐
│ 109.78553604105454 │
└────────────────────┘

1 rows in set. Elapsed: 9.273 sec. Processed 1.00 billion rows, 8.00 GB (107.84 million rows/s., 862.73 MB/s.)

  1. https://clickhouse.com/docs/en/sql-reference/data-types/tuple ↩︎

  2. https://gist.github.com/den-crane/f20a2dce94a2926a1e7cfec7cdd12f6d ↩︎