Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CachePadded: use 128-byte alignment on x86-64 #331

Merged
merged 2 commits into from Feb 20, 2019
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 19 additions & 9 deletions crossbeam-utils/src/cache_padded.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,17 @@ use core::ops::{Deref, DerefMut};
/// CPU cores. Use `CachePadded` to ensure updating one piece of data doesn't invalidate other
/// cached data.
///
/// Cache lines are assumed to be 64 bytes on all architectures.
///
/// # Size and alignment
///
/// The size of `CachePadded<T>` is the smallest multiple of 64 bytes large enough to accommodate
/// Cache lines are assumed to be of N bytes, depending on the architecture:
///
/// * On x86-64, N = 128.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it more the case that x64 CPUs fetch two cache lines at a time to align them to 128-bit chunks rather than that a single cache line is 128-bit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Maybe the wording is not great, but N is just a pessimistic guess of how long a cache line is. So N is at least the length of a cache line, but may be larger than that in order to accommodate architectures with longer cache lines or ones that have unusual prefetching strategies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, perhaps this could be reworded slightly to avoid confusion later? In any case LGTM.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - just added an additional comment for clarification.

/// * On all others, N = 64.
///
/// The size of `CachePadded<T>` is the smallest multiple of N bytes large enough to accommodate
/// a value of type `T`.
///
/// The alignment of `CachePadded<T>` is the maximum of 64 bytes and the alignment of `T`.
/// The alignment of `CachePadded<T>` is the maximum of N bytes and the alignment of `T`.
///
/// # Examples
///
Expand All @@ -25,11 +28,11 @@ use core::ops::{Deref, DerefMut};
/// ```
/// use crossbeam_utils::CachePadded;
///
/// let array = [CachePadded::new(1i32), CachePadded::new(2i32)];
/// let addr1 = &*array[0] as *const i32 as usize;
/// let addr2 = &*array[1] as *const i32 as usize;
/// let array = [CachePadded::new(1i8), CachePadded::new(2i8)];
/// let addr1 = &*array[0] as *const i8 as usize;
/// let addr2 = &*array[1] as *const i8 as usize;
///
/// assert_eq!(addr2 - addr1, 64);
/// assert!(addr2 - addr1 >= 64);
/// assert_eq!(addr1 % 64, 0);
/// assert_eq!(addr2 % 64, 0);
/// ```
Expand All @@ -49,7 +52,14 @@ use core::ops::{Deref, DerefMut};
/// }
/// ```
#[derive(Clone, Copy, Default, Hash, PartialEq, Eq)]
#[repr(align(64))]
// Starting from Intel's Sandy Bridge, spatial prefetcher is now pulling pairs of 64-byte cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also the case for AMD?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging by a quick skim through AMD's optimization manual, I'd say no.
https://developer.amd.com/wordpress/media/2013/12/55723_SOG_Fam_17h_Processors_3.00.pdf

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sad, that there are only intel processors addressed here.

From what I've found Rust has flags to indicate CPU features
https://rust-lang.github.io/packed_simd/perf-guide/target-feature/rustflags.html

Copy link
Member

@taiki-e taiki-e Aug 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mart-Bogdan I would accept a PR to adjust the alignment of CachePadded based on the CPU when the user specifies -C target-cpu in rustflag.

// lines at a time, so we have to align to 128 bytes rather than 64.
//
// Sources:
// - https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
// - https://github.com/facebook/folly/blob/1b5288e6eea6df074758f877c849b6e73bbb9fbb/folly/lang/Align.h#L107
#[cfg_attr(target_arch = "x86_64", repr(align(128)))]
#[cfg_attr(not(target_arch = "x86_64"), repr(align(64)))]
pub struct CachePadded<T> {
value: T,
}
Expand Down