Search CTRL + K

Storage Manager

存储管理器(storage manager) 负责管理数据库文件,它将一组 组织为文件,有的 数据库管理系统 使用文件层次结构,有的使用单一文件(比如 SQLite)。

也有非文件形式

1980 年代早期 数据库管理系统 在裸块设备上使用自建文件系统。直到今天还有部分企业级 数据库管理系统 支持这种方式。[1]

通常 DBMS 不负责管理 的复本

数据复本通常在存储管理器的上下层实现。

文件存储

不同 数据库管理系统 使用不同文件存储方式去组织

存储引擎取舍

存储引擎除了管理数据结构去存储数据外,还添加了缓存、恢复、事务等上层功能。存储引擎还在不断地演进,主要有三个方向的取舍:

ClickHouse 的 MergeTree 存储引擎

ClickHouse MergeTree 存储引擎基于 log-structured merge-tree 实现,为写入性能敏感优化的数据结构,它的取舍是:

  • Buffering:本身没有缓存,但可以结合 [Buickhouse.com/docs/en/engines/table-engines/special/buffer) 实现主动插入聚合;或者使用 Asynchronous Inserts 实现被动插入聚合
  • Mutability:ClickHouse 支持改变写入数据,但是会导致大量写入请求,不被建议大量使用 [2];此外还提供了 Lightweight Delete 实现轻量删除,执行删除时只是标记删除该行,查询过滤,实际删除发生在合并
  • Ordering:要求按照 Primary Key(实际上是 Order By,不存在时使用 Primary Key)排序,但只是一个个 parts 内的数据
File Storage

In its most basic form, a DBMS stores a database as files on disk. Some may use a file hierarchy, others may use a single file (e.g., SQLite).[3]

What is the responsibility of storage manager?

The DBMS’s storage manager is responsible for managing a database’s files. It represents the files as a collection of pages. It also keeps track of what data has been read and written to pages as well how much free space there is in these pages.[3:1]

Buffering, Immutability, and Ording

A storage engine is based on some data structure. However, these structures do not describe the semantics of caching, recovery, transactionality, and other things that storage engines add on top of them.
...
Storage structures have three common variables: they use buffering (or avoid using it), use immutable (or mutable) files, and store values in order (or out of order). Most of the distinctions and optimizations in storage structures discussed in this book are related to one of these three concepts.[4]

  • Buffering: This defines whether or not the storage structure chooses to collect a certain amount of data in memory before putting it on disk.
  • Mutability (or immutability): This defines whether or not the storage structure reads parts of the file, updates them, and writes the updated results at the same location in the file.
  • Ordering: This is defined as whether or not the data records are stored in the key order in the pages on disk.

  1. https://15445.courses.cs.cmu.edu/fall2023/slides/03-storage1.pdf ↩︎

  2. https://clickhouse.com/docs/en/optimize/avoid-mutations ↩︎

  3. https://15445.courses.cs.cmu.edu/fall2023/notes/03-storage1.pdf ↩︎ ↩︎

  4. Petrov, Alex. Database Internals. 1st ed., 2019. ↩︎