Data Files

#CS/Database

Data files（也叫 primary files） 是组成数据库中数据的文件，它将一组页组织为文件，有多种方式：

index-organized tables (IOT)
heap-organized tables (heap files)
hash-organized tables (hashed files)

Heap Files

heap file 是将记录不以特定顺序存储的数据文件，通常是以写入顺序追加到文件中。在写入时无需额外的重组，因此需要额外的索引结构来记录每个记录位置。

对于单文件 heap file（比如 SQLite），可以直接通过页 id 拿到偏移量，访问对应页。

对于多文件 heap file，系统维护特殊的目录页记录其他数据页所在路径。需要保证目录页的目录信息和数据页同步。同时，目录页还记录其他可用空间的元信息：

每个页可用空间大小
可用的数据页、空页列表

Hashed Fiels

hashed fiel 将记录存储在桶（buckets）中，插入的 key 的哈希值决定记录应该插入哪个桶。一个桶内的记录以写入顺序存储，也可以按 key 排序来加速查询。

Index-organized Tables

index-organized tables (IOTs) 将记录直接存储在索引内。由于记录是按 key 顺序存储的，因此在 IOTs 中范围查询（range scans）可以实现为顺序查询（sequential scans）。

将记录直接存储在索引中可以减少至少一次磁盘寻址（disk seek）。

What is heap file?

There are a couple of ways to find the location of the page a DBMS wants on the disk, and heap file organization is one of those ways. A heap file is an unordered collection of pages where tuples are stored in random order.^[1]

What is heap file?

Records in heap files are not required to follow any particular order, and most of the time they are placed in a write order. This way, no additional work or file reorganization is required when new pages are appended. Heap files require additional index structures, pointing to the locations where data records are stored, to make them searchable.^[2]

What is hashed files?

In hashed files, records are stored in buckets, and the hash value of the key deter‐ mines which bucket a record belongs to. Records in the bucket can be stored in append order or sorted by key to improve lookup speed.^[2:1]

What is index-organized tables?

Index-organized tables (IOTs) store data records in the index itself. Since records are stored in key order, range scans in IOTs can be implemented by sequentially scanning its contents.^[2:2]

Advantages of index-organized tables

Storing data records in the index allows us to reduce the number of disk seeks by at least one, since after traversing the index and locating the searched key, we do not have to address a separate file to find the associated data record.^[2:3]

https://15445.courses.cs.cmu.edu/fall2023/notes/03-storage1.pdf ↩︎
Petrov, Alex. Database Internals. 1st ed., 2019. ↩︎ ↩︎ ↩︎ ↩︎