Big Data
大数据 指数据规模、数据复杂性大到传统 OLTP 数据库无法处理,可能是数据 行 极大,也可能是不同 数据模型 混杂,比如纯文本日志、半结构化数据模型 等等。
特点
大数据 相比于传统 OLTP 数据库处理的数据,有三点不同:
- 数据量
- 数据种类
- 处理速度
应用
常见的大数据应用有:
- 预测分析
- 用户行为分析
The volume of such data soon grew well beyond the scale that could be handled by traditional database systems, and both storage and processing require a very high degree of parallelism. Furthermore, much of the data were in textual form such as log records, or in other semi-structured forms. Such data, are characterized by their size, speed at which they are generated, and the variety of formats, are generically called Big Data.[1]
Big Data has been contrasted with traditional relational databases on the following metrics:
- Volume: The amount of data to be stored and processed is much larger than traditional databases, including traditional parallel relational databases, were designed to handle.
- Velocity: The rate of arrival of data are much higher in today’s networked world than in earlier days. Data management systems must be able to ingest and store data at very high rates. Further, many applications need data items to be processed as they arrive, in order to detect and respond quickly to certain events (such systems are referred to a streaming data systems). Thus, processing velocity is very important for many applications today.
- Variety: The relational representation of data, relational query languages, and relational database systems have been very successful over the past several decades, and they form the core of the data representation of most organizations. However, clearly, not all data are relational.
We shall use the term Big Data in a generic sense, to refer to any data-processing need that requires a high degree of parallelism to handle, regardless of whether the data are relational or otherwise.[1:1]
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software.[2]