False Sharing

#CS/Computer #CS/CPU

伪共享（false sharing） 是缓存一致性系统中产生的性能问题，当用户尝试访问缓存块中一个未被修改的数据，但该缓存块还包含一个被另一个用户修改的变量，缓存协议会强制第一个用户重新加载整个缓存块，承担资源的真共享访问（true shared access）所需的缓存系统开销。

多处理器 CPU 缓存

伪共享 常见于多处理器 CPU 缓存中，现在计算机 CPU 缓存往往是多层级的，其中内存被以 2 的 n 次方为大小的缓存行（cache line）形式缓存。

linux 系统获取 cache line 数据

~
❯ getconf -a | grep CACHE
LEVEL1_ICACHE_SIZE                 32768
LEVEL1_ICACHE_ASSOC
LEVEL1_ICACHE_LINESIZE             64
LEVEL1_DCACHE_SIZE                 49152
LEVEL1_DCACHE_ASSOC                12
LEVEL1_DCACHE_LINESIZE             64
LEVEL2_CACHE_SIZE                  1310720
LEVEL2_CACHE_ASSOC                 10
LEVEL2_CACHE_LINESIZE              64
LEVEL3_CACHE_SIZE                  26214400
LEVEL3_CACHE_ASSOC                 10
LEVEL3_CACHE_LINESIZE              64
LEVEL4_CACHE_SIZE
LEVEL4_CACHE_ASSOC
LEVEL4_CACHE_LINESIZE

通常多处理器的缓存会实现缓存一致性协议，比如 x86 的 MESI 协议和 MESIF 协议。

在 SMP 架构中，每个 CPU 都有自己的本地 cache，因此同一行 cache line 就有可能存在于多个处理器的本地 cache 中，此时该 cache line 处于 shared 状态。当处理器 1 修改该 cache line 的变量时，除了将处理器 1 本地 cache line 置 modified 状态以外，还必须在另一个处理器 2 访问 cache line 之前将处理器 2 本地 cache line 置 invalidate 状态。当处理器 2 访问本地 cache line 时发现 invalidate 状态，就会触发 cache miss，然后通过总线从内存中重新加载 cache line。^[1]

伪共享除了浪费系统带宽外，还造成内存停顿，影响性能。

golang 性能测试

package test

import (
    "sync"
    "sync/atomic"
    "testing"
)

// 未对齐的结构体（a 和 b 位于同一缓存行）
type Unpadded struct {
    a int32
    b int32
}

// 对齐的结构体（a 和 b 位于不同缓存行）
type Padded struct {
    a int32
    _ [60]byte // 填充 60 字节（64 - 4 = 60）
    b int32
    _ [60]byte
}

// 未对齐结构体（伪共享）
func BenchmarkUnpadded(b *testing.B) {
    var s Unpadded
    var wg sync.WaitGroup
    wg.Add(2)

    // Goroutine 1: 只操作 a
    go func() {
        defer wg.Done()
        for i := 0; i < b.N; i++ {
            atomic.AddInt32(&s.a, 1)
        }
    }()

    // Goroutine 2: 只操作 b
    go func() {
        defer wg.Done()
        for i := 0; i < b.N; i++ {
            atomic.AddInt32(&s.b, 1)
        }
    }()

    wg.Wait()
}

// 对齐结构体（无伪共享）
func BenchmarkPadded(b *testing.B) {
    var s Padded
    var wg sync.WaitGroup
    wg.Add(2)

    go func() {
        defer wg.Done()
        for i := 0; i < b.N; i++ {
            atomic.AddInt32(&s.a, 1)
        }
    }()

    go func() {
        defer wg.Done()
        for i := 0; i < b.N; i++ {
            atomic.AddInt32(&s.b, 1)
        }
    }()

    wg.Wait()
}

运行结果：

~/Temporary/test via 🐹 v1.24.2
❯ go test -bench .
goos: linux
goarch: amd64
pkg: aaa
cpu: 12th Gen Intel(R) Core(TM) i7-12700
BenchmarkUnpadded-20            63919624                19.53 ns/op
BenchmarkPadded-20              283119007                4.043 ns/op
PASS
ok      aaa     2.846s

近乎 5 倍差异。

What is false sharing?

In computer science, false sharing is a performance-degrading usage pattern that can arise in systems with distributed, coherent caches at the size of the smallest resource block managed by the caching mechanism. When a system participant attempts to periodically access data that is not being altered by another party, but that data shares a cache block with data that is being altered, the caching protocol may force the first participant to reload the whole cache block despite a lack of logical necessity. The caching system is unaware of activity within this block and forces the first participant to bear the caching system overhead required by true shared access of a resource.^[2]