Performance

#IT/System #IT/SoftwareEngineering

性能是指系统保质地完成给定任务的速度。对于不同的系统往往有不同的性能指标，比如对于 Hadoop 这种批处理系统而言，更关心吞吐量（throughput）；对于在线系统，更关心响应时间（response time）。

一般延迟（latency）和响应时间（response time）被用作同义词，但其实两者差异极大：

响应时间：指客户看到的“处理时间”。包括网络耗时、服务器处理、队列排队等等
延迟：指接收到任何响应所花费的最小时间。包含网络耗时等非主观因素

对于应用开发者，延迟一般无法降低（硬件决定）。所以应该关注响应时间。

- Designing Data-Intensive Applications

对系统性能的要求会被设定在 SLO 和 SLA 中，表示系统性能的指标推荐使用百分位数（percentiles）而不是平均数，比如中位数、95 分位数（p95）、99 分位数（p99）甚至 99.9 分位数（p999）。

尾部延迟（tail latency）往往影响用户体验，当系统负载达到 100% 后就会产生排队延迟（queueing delay），只要有少量慢请求就会阻碍后续请求的处理，这被称为头部阻塞（head-of-line blocking）。

ESI

System performance is the amount of useful work done by a system - measured by the production speed of products of a predefined quality.^[2]

Designing Data-Intensive Applications

In a batch processing system such as Hadoop, we usually care about throughput — the number of records we can process per second, or the total time it takes to run a job on a dataset of a certain size.iii In online systems, what’s usually more important is the service’s response time — that is, the time between a client sending a request and receiving a response.^[3]

Designing Data-Intensive Applications

For example, percentiles are often used in service level objectives (SLOs) and service level agreements (SLAs), contracts that define the expected performance and availability of a service. An SLA may state that the service is considered to be up if it has a median response time of less than 200 ms and a 99th percentile under 1 s (if the response time is longer, it might as well be down), and the service may be required to be up at least 99.9% of the time. These metrics set expectations for clients of the service and allow customers to demand a refund if the SLA is not met.

Queueing delays often account for a large part of the response time at high percentiles. As a server can only process a small number of things in parallel (limited, for example, by its number of CPU cores), it only takes a small number of slow requests to hold up the processing of subsequent requests—an effect sometimes known as head-of-line blocking. Even if those subsequent requests are fast to process on the server, the client will see a slow overall response time due to the time waiting for the prior request to complete. Due to this effect, it is important to measure response times on the client side.^[4]

Martin Fowler: Patterns of Enterprise Application Architecture. Addison Wesley, 2002. ISBN: 978-0-321-12742-6 ↩︎
https://esi.nl/research/program-lines/system-performance ↩︎
Martin Kleppmann, Designing Data-Intensive Applications, n.d. p13 ↩︎
Martin Kleppmann, Designing Data-Intensive Applications, n.d. p15 ↩︎