Search CTRL + K

ClickHouse 作用于主键判断词的单调函数优化

ClickHouse 会通过主键剪枝不要的数据块(granules),但如果主键被其他函数包裹:

例子

Github Issue

作者通过 toString 调用主键,却无法命中主键索引,详见 链接

下面贴出最小例子,可以 ClickHouse Playground 简单验证。

单调函数

CREATE TABLE users (uid Int16, name String, age Int16) ENGINE=MergeTree() ORDER BY (uid);

INSERT INTO users VALUES (1, 'John', 33);
INSERT INTO users VALUES (2, 'Ksenia', 48);
INSERT INTO users VALUES (3, 'Alice', 50);

EXPLAIN indexes=1 SELECT count(1) FROM users WHERE (uid + 2) > 3;

Condition 成功剪枝:

Expression ((Projection + Before ORDER BY))
  Aggregating
    Expression (Before GROUP BY)
      Filter (WHERE)
        ReadFromMergeTree (default.users)
        Indexes:
          PrimaryKey
            Keys: 
              uid
            Condition: (plus(uid, 2) in [4, +Inf))
            Parts: 2/3
            Granules: 2/3

非单调函数

CREATE TABLE users (uid Int16, name String, age Int16) ENGINE=MergeTree() ORDER BY (uid);

INSERT INTO users VALUES (1, 'John', 33);
INSERT INTO users VALUES (2, 'Ksenia', 48);
INSERT INTO users VALUES (3, 'Alice', 50);

EXPLAIN indexes=1 SELECT count(1) FROM users WHERE (uid * 2) > 3;

Condition 无法优化,全表扫描:

Expression ((Projection + Before ORDER BY))
  Aggregating
    Expression (Before GROUP BY)
      Filter (WHERE)
        ReadFromMergeTree (default.users)
        Indexes:
          PrimaryKey
            Condition: true
            Parts: 3/3
            Granules: 3/3

源码

ClickHouse 版本

本节源码采用 ClickHouse 版本 23.4,时间 2023-06-01。

主键判断词优化主要在 src/Storages/MergeTree/KeyCondition.cppsrc/Storages/MergeTree/KeyCondition.h 中。

KeyCondition::extractAtomFromTree 开始开起。