Storage Manager
存储管理器(storage manager) 负责管理数据库文件,它将一组 页 组织为文件,有的 数据库管理系统 使用文件层次结构,有的使用单一文件(比如 SQLite)。
文件存储
- Heap File Organizatin
- Tree File Organization
- Sequential/Sorted File Organization (ISAM)
- Hashing File Organization
存储引擎取舍
存储引擎除了管理数据结构去存储数据外,还添加了缓存、恢复、事务等上层功能。存储引擎还在不断地演进,主要有三个方向的取舍:
- Buffering:在内存中缓存数据批量插入
- Mutability (Immutability):支持就地修改存储在磁盘上的数据
- Ordering:数据是否按特定键顺序存储
ClickHouse MergeTree 存储引擎基于 log-structured merge-tree 实现,为写入性能敏感优化的数据结构,它的取舍是:
- Buffering:本身没有缓存,但可以结合 [Buickhouse.com/docs/en/engines/table-engines/special/buffer) 实现主动插入聚合;或者使用 Asynchronous Inserts 实现被动插入聚合
- Mutability:ClickHouse 支持改变写入数据,但是会导致大量写入请求,不被建议大量使用 [2];此外还提供了 Lightweight Delete 实现轻量删除,执行删除时只是标记删除该行,查询过滤,实际删除发生在合并
- Ordering:要求按照 Primary Key(实际上是 Order By,不存在时使用 Primary Key)排序,但只是一个个 parts 内的数据
In its most basic form, a DBMS stores a database as files on disk. Some may use a file hierarchy, others may use a single file (e.g., SQLite).[3]
The DBMS’s storage manager is responsible for managing a database’s files. It represents the files as a collection of pages. It also keeps track of what data has been read and written to pages as well how much free space there is in these pages.[3:1]
A storage engine is based on some data structure. However, these structures do not describe the semantics of caching, recovery, transactionality, and other things that storage engines add on top of them.
...
Storage structures have three common variables: they use buffering (or avoid using it), use immutable (or mutable) files, and store values in order (or out of order). Most of the distinctions and optimizations in storage structures discussed in this book are related to one of these three concepts.[4]
- Buffering: This defines whether or not the storage structure chooses to collect a certain amount of data in memory before putting it on disk.
- Mutability (or immutability): This defines whether or not the storage structure reads parts of the file, updates them, and writes the updated results at the same location in the file.
- Ordering: This is defined as whether or not the data records are stored in the key order in the pages on disk.
https://15445.courses.cs.cmu.edu/fall2023/slides/03-storage1.pdf ↩︎
https://15445.courses.cs.cmu.edu/fall2023/notes/03-storage1.pdf ↩︎ ↩︎
Petrov, Alex. Database Internals. 1st ed., 2019. ↩︎