-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CachePadded: use 128-byte alignment on x86-64 #331
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,14 +9,17 @@ use core::ops::{Deref, DerefMut}; | |
/// CPU cores. Use `CachePadded` to ensure updating one piece of data doesn't invalidate other | ||
/// cached data. | ||
/// | ||
/// Cache lines are assumed to be 64 bytes on all architectures. | ||
/// | ||
/// # Size and alignment | ||
/// | ||
/// The size of `CachePadded<T>` is the smallest multiple of 64 bytes large enough to accommodate | ||
/// Cache lines are assumed to be of N bytes, depending on the architecture: | ||
/// | ||
/// * On x86-64, N = 128. | ||
/// * On all others, N = 64. | ||
/// | ||
/// The size of `CachePadded<T>` is the smallest multiple of N bytes large enough to accommodate | ||
/// a value of type `T`. | ||
/// | ||
/// The alignment of `CachePadded<T>` is the maximum of 64 bytes and the alignment of `T`. | ||
/// The alignment of `CachePadded<T>` is the maximum of N bytes and the alignment of `T`. | ||
/// | ||
/// # Examples | ||
/// | ||
|
@@ -25,11 +28,11 @@ use core::ops::{Deref, DerefMut}; | |
/// ``` | ||
/// use crossbeam_utils::CachePadded; | ||
/// | ||
/// let array = [CachePadded::new(1i32), CachePadded::new(2i32)]; | ||
/// let addr1 = &*array[0] as *const i32 as usize; | ||
/// let addr2 = &*array[1] as *const i32 as usize; | ||
/// let array = [CachePadded::new(1i8), CachePadded::new(2i8)]; | ||
/// let addr1 = &*array[0] as *const i8 as usize; | ||
/// let addr2 = &*array[1] as *const i8 as usize; | ||
/// | ||
/// assert_eq!(addr2 - addr1, 64); | ||
/// assert!(addr2 - addr1 >= 64); | ||
/// assert_eq!(addr1 % 64, 0); | ||
/// assert_eq!(addr2 % 64, 0); | ||
/// ``` | ||
|
@@ -49,7 +52,14 @@ use core::ops::{Deref, DerefMut}; | |
/// } | ||
/// ``` | ||
#[derive(Clone, Copy, Default, Hash, PartialEq, Eq)] | ||
#[repr(align(64))] | ||
// Starting from Intel's Sandy Bridge, spatial prefetcher is now pulling pairs of 64-byte cache | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this also the case for AMD? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Judging by a quick skim through AMD's optimization manual, I'd say no. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sad, that there are only intel processors addressed here. From what I've found Rust has flags to indicate CPU features There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Mart-Bogdan I would accept a PR to adjust the alignment of CachePadded based on the CPU when the user specifies -C target-cpu in rustflag. |
||
// lines at a time, so we have to align to 128 bytes rather than 64. | ||
// | ||
// Sources: | ||
// - https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf | ||
// - https://github.com/facebook/folly/blob/1b5288e6eea6df074758f877c849b6e73bbb9fbb/folly/lang/Align.h#L107 | ||
#[cfg_attr(target_arch = "x86_64", repr(align(128)))] | ||
#[cfg_attr(not(target_arch = "x86_64"), repr(align(64)))] | ||
pub struct CachePadded<T> { | ||
value: T, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it more the case that x64 CPUs fetch two cache lines at a time to align them to 128-bit chunks rather than that a single cache line is 128-bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Maybe the wording is not great, but
N
is just a pessimistic guess of how long a cache line is. SoN
is at least the length of a cache line, but may be larger than that in order to accommodate architectures with longer cache lines or ones that have unusual prefetching strategies.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, perhaps this could be reworded slightly to avoid confusion later? In any case LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - just added an additional comment for clarification.