Search CTRL + K

Database Page

存储管理器 将存储空间切分为一个个等大的块,叫 页(page),多个块组成一个或多个文件。 可以存储不同类型的数据,比如 元组、索引等,通常不会在一个 内混合存储不同数据。

每个 包含一个独一无二的标识符,如果数据库是单一文件,那标识符就是文件偏移;如果是多文件, DBMS 会包含一个中间层,将页面 id 转成对应文件路径和偏移量。

为什么数据库页面是固定大小?

老生常谈的问题,用内碎片避免外碎片。

对于一个 DBMS 来说,通常有三种页:

不同数据库默认页大小

  • 4KB:SQLite、DB2、ORACLE、RocksDB
  • 8KB:SQLServer、PostgreSQL
  • 16KB:MySQL

值得注意的是,存储设备只保证硬件页的原子写入,因此对于数据库页大于硬件页时,存储管理器 还需要采取额外步骤保证数据库页的完整写入。

页结构

每个 都包含 header 记录 的元信息:

列式 数据库

主要有两种方式在数据 中存储 元组

Slotted Pages

的尾部从后往前存储 元组,在 header 后从前往后存储 slot 数组,记录每个 元组 的偏移量。同时,在 header 中记录:

What is database pages?

The DBMS organizes the database across one or more files in fixed-size blocks of data called pages. Pages can contain different kinds of data (tuples, indexes, etc). Most systems will not mix these types within pages. Some systems will require that pages are self-contained, meaning that all the information needed to read each page is on the page itself.[1]

How do databases page come to work?

Each page is given a unique identifier. If the database is a single file, then the page id can just be the file offset. Most DBMSs have an indirection layer that maps a page id to a file path and offset. The upper levels of the system will ask for a specific page number. Then, the storage manager will have to turn that page number into a file and an offset to find the page.[1:1]

Why database pages are fixed-size?

Most DBMSs uses fixed-size pages to avoid the engineering overhead needed to support variable-sized pages. For example, with variable-size pages, deleting a page could create a hole in files that the DBMS cannot easily fill with new pages.[1:2]

Atomic write of storage device

The storage device guarantees an atomic write of the size of the hardware page. If the hardware page is 4 KB and the system tries to write 4 KB to the disk, either all 4 KB will be written, or none of it will. This means that if our database page is larger than our hardware page, the DBMS will have to take extra measures to ensure that the data gets written out safely since the program can get partway through writing a database page to disk when the system crashes.[1:3]

Laying out data in pages

There are two main approaches to laying out data in pages: (1) slotted-pages and (2) log-structured.[1:4]


  1. https://15445.courses.cs.cmu.edu/fall2023/notes/03-storage1.pdf ↩︎ ↩︎ ↩︎ ↩︎ ↩︎