mirror of
https://chromium.googlesource.com/crosvm/crosvm
synced 2025-02-05 18:20:34 +00:00
e425f57d5b
Updating an atomic value invalidates the entire cache line to which it belongs, which can make the next access to that cache line slower on other CPU cores. This can lead to "destructive interference" or "false sharing", where atomic operations on two or more unrelated values on the same cache line cause hardware interference with each other, reducing overall performance. Deal with this by aligning atomic primitives to the cache line width so that two primitives are not placed on the same cache line. This also has the benefit of causing *constructive* interference between the atomic value and the data it protects. Since the user of the atomic primitive likely wants to access the protected data after acquiring access, having them both on the same cache line makes the subsequent access to the data faster. A common pattern for synchronization primitives is to put them inside an Arc. However, if the primitive did not specify cache line alignment then both the atomic reference count and the atomic state could end up on the same cache line. In this case, changing the reference count of the primitive would cause destructive interference with its operation. With the proper alignment, both the atomic state and the reference count end up on different cache lines so there would be no interference between them. Since we can't query the cache line width of the target machine at build time, we pessimistically use an alignment of 128 bytes based on the following observations: * On x86, the cache line is usually 64 bytes. However, on Intel cpus the spatial prefetcher "strives to complete every cache line fetched to the L2 cache with the pair line that completes it to a 128-byte aligned chunk" (section 2.3.5.4 of [1]). So to avoid destructive interference we need to align on every pair of cache lines. * On ARM, both cortex A-15 (armv7 [2]) and cortex A-77 (aarch64 [3]) have 64-byte data cache lines. However, Qualcomm Snapdragon CPUs can have 128-byte data cache lines [4]. Since Chrome OS code compiled for armv7 can still run on aarch64 cpus with 128-byte cache lines assume we need 128-byte alignment there as well. [1]: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf [2]: https://developer.arm.com/documentation/ddi0438/d/Level-2-Memory-System/About-the-L2-memory-system [3]: https://developer.arm.com/documentation/101111/0101/functional-description/level-2-memory-system/about-the-l2-memory-system [4]: https://www.7-cpu.com/cpu/Snapdragon.html BUG=none TEST=unit tests Change-Id: Iaf6a29ad0d35411c70fd0e833cc6a49eda029bbc Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/platform/crosvm/+/2804869 Reviewed-by: Daniel Verkamp <dverkamp@chromium.org> Tested-by: kokoro <noreply+kokoro@google.com> Commit-Queue: Chirantan Ekbote <chirantan@chromium.org> |
||
---|---|---|
.. | ||
src | ||
.build_test_skip | ||
Cargo.toml |