schema-on-write
写时模式 是传统的数据分析流程,首先将不同来源的数据格式化、结构化为定义好的 架构,这个流程称为 ETL,然后存储数据于 关系型数据库。
这种模式让存储、查询非常高效,但也有几个问题:
该模式常见于 关系型数据库。
随着 大数据 时代来临,写时模式 的矛盾越发尖锐,于是 读时模式 提出。
Schema-on-Write is a traditional approach where data is first structured and transformed before being loaded into a data storage system. The structure or schema is defined upfront and data must conform to that schema before it can be ingested. This approach ensures data integrity and consistency, but it can be inflexible and time-consuming, especially when dealing with rapidly changing or unstructured data.[1]
- dremio
In a Schema-on-Write approach, data is first transformed and structured according to a predefined schema. This typically involves extracting, cleaning, and transforming the data before storing it in a structured format like a relational database. The schema defines the structure and data types of the columns in the database table, allowing for efficient storage and retrieval.[1:1]
- dremio
But there is an unfortunate problem — we can't upload data until the table is created and we can't create tables until we understand the schema of the data that will be in this table. This is impossible until we understand the entities that this data represents to correctly reflect their relationships in the tables. This also leads to problems with changing the data.[2]