The RUM Conjecture
数据系统中永远存在读取时间(Read overhead)、更新成本(Update overhead)和存储开销(Memory overhead)之间的取舍平衡。[1]
RUM 猜想提出,这三者之间存在一个不可能三角,对任何两个领域的优化都会对第三个领域产生负面影响。
但 RUM 猜想并不是想让大家不要优化了,而是根据数据系统的使用场景考量优化方向。
为了降低更新成本,可以使用基于差异结构的设计(比如 TSM),避免重新排序数据。但这种方式增加了存储成本和读取时间,因为查询需要读取更多的数据以合并挂起变更。
The ubiquitous fight between the Read, the Update, and the Memory overhead of access methods for modern data systems.
The fundamental challenges that every researcher, systems architect, or developer faces when designing a new access method are how to minimize, i) read times (R), ii) update cost (U), and iii) memory (or storage) overhead (M). In this project we first conjecture that when optimizing the read-update-memory overheads, optimizing in any two areas negatively impacts the third. Based on the RUM Conjecture, at DASlab, we study the manifestation of the balance of the RUM overheads in state-of-the-art access methods, and we pursue a path toward RUM-aware access methods for future data systems.[1:1]