Search CTRL + K

ClickHouse 连接多 Kerberos-aware Kafka 集群

ClickHouse 可以通过 KafkaEngine 拉取 Kafka 数据,在 DDL 中指定:[1]

但是公有云 Kafka 往往需要使用 kerberos 以加强安全性,ClickHouse 这边的配置就不在 DDL 中,而是需要配置文件。

ClickHouse 访问单 Kafka 配置 kerberos

若 ClickHouse 集群只访问一个配置了 kerberos 的 Kafka 集群,那只需在配置文件中添加如下内容 [2][3]

<clickhouse>
  <kafka>
    <sasl_username>username</sasl_username>
    <sasl_password>password</sasl_password>
    <security_protocol>sasl_ssl</security_protocol>
    <sasl_mechanisms>PLAIN</sasl_mechanisms>
  </kafka>
</clickhouse>

具体可以配置哪些参数可以看 librdkafka的配置,这是 ClickHouse 使用的底层 Kafka 库。

值得一提的是,DDL 中的配置都可以写到配置文件中,比如 [4]

<clickhouse>
  <kafka_broker_list>host:port</kafka_broker_list>
  <kafka_topic_list>topic1,topic2,...</kafka_topic_list>
  <kafka>
    <sasl_username>username</sasl_username>
    <sasl_password>password</sasl_password>
    <security_protocol>sasl_ssl</security_protocol>
    <sasl_mechanisms>PLAIN</sasl_mechanisms>
  </kafka>
</clickhouse>

ClickHouse 访问多 Kafka 配置 kerberos

当 ClickHouse 需要访问多个不同 Kafka 集群,且都配置了 kerberos,又该如何配置呢?

这就需要使用 ClickHouse 的 Named collections 配置,简单来讲就是将一些需要覆盖的配置整合成一个命名集合,在 SQL 中指定该命名集合后就会用集合内的配置覆盖原本的配置(需要配置开启 allow_named_collection_override_by_default)。

于是我们可以这样配置 [5]

<clickhouse>
  <named_collections>
    <the_first_kafka>
      <kafka>
        <sasl_username>username</sasl_username>
        <sasl_password>password</sasl_password>
        <security_protocol>sasl_ssl</security_protocol>
        <sasl_mechanisms>PLAIN</sasl_mechanisms>
      </kafka>
    </the_first_kafka>
    <the_second_kafka>
      <kafka_broker_list>host:port</kafka_broker_list>
      <kafka_topic_list>topic1,topic2,...</kafka_topic_list>
      <kafka_group_name>group_name</kafka_group_name>
      <kafka>
        <sasl_username>username</sasl_username>
        <sasl_password>password</sasl_password>
        <security_protocol>sasl_ssl</security_protocol>
        <sasl_mechanisms>PLAIN</sasl_mechanisms>
      </kafka>
    </the_second_kafka>
  </named_collections>
</clickhouse>

然后通过 DDL 创建 KafkaEngine 时指定该 Named collections:

CREATE TABLE kafka_test
(
    ...
) ENGINE = Kafka(the_second_kafka)
SETTINGS
  kafka_format = 'JSON';

  1. https://clickhouse.com/docs/en/engines/table-engines/integrations/kafka ↩︎

  2. https://clickhouse.com/docs/en/integrations/kafka/kafka-table-engine#2-configure-clickhouse ↩︎

  3. https://clickhouse.com/docs/en/engines/table-engines/integrations/kafka#kafka-kerberos-support ↩︎

  4. https://github.com/ClickHouse/ClickHouse/issues/28703#issuecomment-1241852550 ↩︎

  5. https://kb.altinity.com/altinity-kb-integrations/altinity-kb-kafka/altinity-kb-adjusting-librdkafka-settings/#different-configurations-for-different-tables ↩︎