Search CTRL + K


故障 指系统中单个组件偏离预定轨道,而系统整体仍然部分可用。注意与 失效 区别。


- Designing Data-Intensive Applications

故障 分为三类:硬件故障、软件故障和人为故障。


硬件故障(hardware faults)通常通过冗余解决。


系统错误(systematic error)难以预料,潜伏很久,往往造成系统 失效




Designing Data-Intensive Applications

A fault is usually defined as one component of the system deviating from its spec.[1:1]

Designing Data-Intensive Applications

Our first response is usually to add redundancy to the individual hardware components in order to reduce the failure rate of the system.[1:2]

Designing Data-Intensive Applications

Such faults are harder to anticipate, and because they are correlated across nodes, they tend to cause many more system failures than uncorrelated hardware faults.

The bugs that cause these kinds of software faults often lie dormant for a long time until they are triggered by an unusual set of circumstances. In those circumstances, it is revealed that the software is making some kind of assumption about its environment—and while that assumption is usually true, it eventually stops being true for some reason[2]

  1. Martin Kleppmann, Designing Data-Intensive Applications, n.d. p7 ↩︎ ↩︎ ↩︎

  2. Martin Kleppmann, Designing Data-Intensive Applications, n.d. p8 ↩︎