Database Page
存储管理器 将存储空间切分为一个个等大的块,叫 页(page),多个块组成一个或多个文件。页 可以存储不同类型的数据,比如 元组、索引等,通常不会在一个 页 内混合存储不同数据。
每个 页 包含一个独一无二的标识符,如果数据库是单一文件,那标识符就是文件偏移;如果是多文件, DBMS 会包含一个中间层,将页面 id 转成对应文件路径和偏移量。
老生常谈的问题,用内碎片避免外碎片。
对于一个 DBMS 来说,通常有三种页:
- 硬件页(通常 4KB)
- 操作系统页(通常 4KB)
- 数据库页(1~16KB)
- 4KB:SQLite、DB2、ORACLE、RocksDB
- 8KB:SQLServer、PostgreSQL
- 16KB:MySQL
值得注意的是,存储设备只保证硬件页的原子写入,因此对于数据库页大于硬件页时,存储管理器 还需要采取额外步骤保证数据库页的完整写入。
页结构
每个 页 都包含 header 记录 页 的元信息:
- 页大小
- Checksum
- 数据库版本
- 事物可见性
- self-containment(某些系统要求,比如 Oracle)
列式 数据库
- slotted-pages
- log-structured
Slotted Pages
在 页 的尾部从后往前存储 元组,在 header 后从前往后存储 slot 数组,记录每个 元组 的偏移量。同时,在 header 中记录:
- 用到的 slots
- 最后一个插入位置的偏移量
The DBMS organizes the database across one or more files in fixed-size blocks of data called pages. Pages can contain different kinds of data (tuples, indexes, etc). Most systems will not mix these types within pages. Some systems will require that pages are self-contained, meaning that all the information needed to read each page is on the page itself.[1]
Each page is given a unique identifier. If the database is a single file, then the page id can just be the file offset. Most DBMSs have an indirection layer that maps a page id to a file path and offset. The upper levels of the system will ask for a specific page number. Then, the storage manager will have to turn that page number into a file and an offset to find the page.[1:1]
Most DBMSs uses fixed-size pages to avoid the engineering overhead needed to support variable-sized pages. For example, with variable-size pages, deleting a page could create a hole in files that the DBMS cannot easily fill with new pages.[1:2]
The storage device guarantees an atomic write of the size of the hardware page. If the hardware page is 4 KB and the system tries to write 4 KB to the disk, either all 4 KB will be written, or none of it will. This means that if our database page is larger than our hardware page, the DBMS will have to take extra measures to ensure that the data gets written out safely since the program can get partway through writing a database page to disk when the system crashes.[1:3]
There are two main approaches to laying out data in pages: (1) slotted-pages and (2) log-structured.[1:4]