Duplication
在 数据库规范化 中,副本 问题指存储 ID 还是字符串:
- 存储 ID: 有意义的信息只存储在一处,所有引用它的地方使用 ID
- 存储字符串:直接存储文本,有意义的信息会被复制到每处使用的地方
使用 ID 的好处是,ID 对人类而言没有意义,永远不需要改变,即是它标识的信息变化。任何对人类有意义的东西都会变,如果使用字符串方式写入,后续变更需要同时修改所有副本,会导致写入开销、也存在不一致风险。
使用 ID 的坏处是,查询时需要额外的关联,影响查询性能。
Whether you store an ID or a text string is a question of duplication. When you use an ID, the information that is meaningful to humans (such as the word Philanthropy) is stored in only one place, and everything that refers to it uses an ID (which only has meaning within the database). When you store the text directly, you are duplicating the human-meaningful information in every record that uses it.
The advantage of using an ID is that because it has no meaning to humans, it never needs to change: the ID can remain the same, even if the information it identifies changes. Anything that is meaningful to humans may need to change sometime in the future—and if that information is duplicated, all the redundant copies need to be updated. That incurs write overheads, and risks inconsistencies (where some copies of the information are updated but others aren’t).[1]
Martin Kleppmann, Designing Data-Intensive Applications, n.d. p33 ↩︎