Search CTRL + K

Data Lake

数据湖 是一个高扩展性的集中存储,可以大规模存储原始的结构化、半结构化和无结构数据,而无须对数据预处理。但在查询时需要处理数据格式转换。


两者通常互补

  • 数据湖 存储未处理过的原始数据,在查询时处理
  • 数据仓库 通过 ETL 存储处理后的 结构化数据

Database System Concepts

Thetermdata lake is used to refer to a repository where data can be stored in multiple formats, including structured records and unstructured file formats. Unlike data warehouses, data lakes do not require up-front effort to preprocess data, but they do require more effort when creating queries.[2]


  1. https://aws.amazon.com/cn/big-data/datalakes-and-analytics/what-is-a-data-lake/ ↩︎

  2. Abraham Silberschatz, Henry F. Korth, and S. Sudarshan, Database System Concepts, Seventh edition (New York, NY: McGraw-Hill, 2020). P527 ↩︎