Data Lake

数据湖 是一个高扩展性的集中存储,可以大规模存储原始的结构化、半结构化和无结构数据,而无须对数据预处理。但在查询时需要处理数据格式转换。


  • 数据湖 存储未处理过的原始数据,在查询时处理
  • 数据仓库 通过 ETL 存储处理后的 结构化数据

Database System Concepts

Thetermdata lake is used to refer to a repository where data can be stored in multiple formats, including structured records and unstructured file formats. Unlike data warehouses, data lakes do not require up-front effort to preprocess data, but they do require more effort when creating queries.[2]

