Search CTRL + K

Performance

性能 是指系统保质地完成给定任务的速度。对于不同的系统往往有不同的性能指标,比如对于 Hadoop 这种批处理系统而言,更关心吞吐量(throughput);对于在线系统,更关心响应时间(response time)。


一般延迟(latency)和响应时间(response time)被用作同义词,但其实两者差异极大:

  • 响应时间:指客户看到的“处理时间”。包括网络耗时、服务器处理、队列排队等等
  • 延迟:指接收到任何响应所花费的最小时间。包含网络耗时等非主观因素

对于应用开发者,延迟一般无法降低(硬件决定)。所以应该关注响应时间。

- Designing Data-Intensive Applications

对系统性能的要求会被设定在 SLOSLA 中,表示系统性能的指标推荐使用百分位数(percentiles)而不是 平均数,比如 中位数、95 分位数(p95)、99 分位数(p99)甚至 99.9 分位数(p999)。

尾部延迟(tail latency)往往影响用户体验,当系统 负载 达到 100% 后就会产生排队延迟(queueing delay),只要有少量慢请求就会阻碍后续请求的处理,这被称为头部阻塞(head-of-line blocking)。


ESI

System performance is the amount of useful work done by a system - measured by the production speed of products of a predefined quality.[2]

Designing Data-Intensive Applications

In a batch processing system such as Hadoop, we usually care about throughput — the number of records we can process per second, or the total time it takes to run a job on a dataset of a certain size.iii In online systems, what’s usually more important is the service’s response time — that is, the time between a client sending a request and receiving a response.[3]

Designing Data-Intensive Applications

For example, percentiles are often used in service level objectives (SLOs) and service level agreements (SLAs), contracts that define the expected performance and availability of a service. An SLA may state that the service is considered to be up if it has a median response time of less than 200 ms and a 99th percentile under 1 s (if the response time is longer, it might as well be down), and the service may be required to be up at least 99.9% of the time. These metrics set expectations for clients of the service and allow customers to demand a refund if the SLA is not met.

Queueing delays often account for a large part of the response time at high percentiles. As a server can only process a small number of things in parallel (limited, for example, by its number of CPU cores), it only takes a small number of slow requests to hold up the processing of subsequent requests—an effect sometimes known as head-of-line blocking. Even if those subsequent requests are fast to process on the server, the client will see a slow overall response time due to the time waiting for the prior request to complete. Due to this effect, it is important to measure response times on the client side.[4]


  1. Martin Fowler: Patterns of Enterprise Application Architecture. Addison Wesley, 2002. ISBN: 978-0-321-12742-6 ↩︎

  2. https://esi.nl/research/program-lines/system-performance ↩︎

  3. Martin Kleppmann, Designing Data-Intensive Applications, n.d. p13 ↩︎

  4. Martin Kleppmann, Designing Data-Intensive Applications, n.d. p15 ↩︎