-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-185: Make padding and alignment for all buffers be 64 bytes #74
Conversation
emkornfield
commented
May 11, 2016
- some small cleanup/removal of unnecessary code. I think there is likely a good opportunity to factor this code better generally, but this seems to work for now.
constexpr int64_t multiple_bitmask = round_to - 1; | ||
int64_t remainder = num & multiple_bitmask; | ||
int rounded = num; | ||
if (remainder) { rounded += 64 - remainder; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use round_to here. I'm also pretty sure there is something clever we could do to avoid the condition here, but at the moment I'm blanking on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this do it?
(num + multiple_bitmask) & ~multiple_bitmask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that looks right to me. although the performance gains are probably moot given the other condition for overflow.
hmm, tests passed locally, will need to take a closer look at what is going on. |
virtual ~Buffer(); | ||
|
||
// An offset into data that is owned by another buffer, but we want to be | ||
// able to retain a valid pointer to it even after other shared_ptr's to the | ||
// parent buffer have been destroyed | ||
// TODO(emkornfield) how will this play with 64 byte alignment/padding? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inevitably alignment and padding isn't always going to be a guarantee on in-memory data (of course when data is moved for IPC purposes, that will need to be guaranteed). I suppose then that buffers will need to be able to communicate their alignment/padding for algorithm selection (i.e. can we use the spiffy AVX512 function or not?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to see how use-cases play out. It seems given the current spec, most slicing operations in the general case will need memory allocation anyways. We could likely guarantee alignment/padding by providing a utility method that either allocates slices if it can keep the contract otherwise allocates new underlying data. For now I will put a warning here.
still need to address other comments, but pushed a commit that should allow C++ tests to pass, I still need to check if python tests are still failing. |
should be ready for review. Not done here is verification of alignment on RPC I will open up a jira to address this, if that is ok. |
// | ||
// This method makes no assertions about alignment or padding of the buffer but | ||
// in general we expected buffers to be aligned and padded to 64 bytes. In the future | ||
// we might add utility methods to help determine if a buffer satisfies this contract. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably what we can do is add a method to produce a buffer that is guaranteed to be aligned and padded (allocating as necessary). For example: if there is incoming data from another library to libarrow that is not aligned or padded, some algorithms may work without alignment or padding, while others (e.g. requiring SIMD) would require the buffer to be "fixed". This could get pretty hairy, though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about the case where an Arrow array is constructed from memory allocated elsewhere with zero copy
LGTM. thank you for the thorough efforts on this. +1 |
…e#74) * Temporarily matching what the dremio does for mod zero. * Used the latest Arrow APIs for allocating buffers.
…e#74) * Temporarily matching what the dremio does for mod zero. * Used the latest Arrow APIs for allocating buffers.
…e#74) * Temporarily matching what the dremio does for mod zero. * Used the latest Arrow APIs for allocating buffers.
…e#74) * Temporarily matching what the dremio does for mod zero. * Used the latest Arrow APIs for allocating buffers.
I added an option to make SSE strictly opt-in for now. As a side effect of this, parquet-cpp now builds and the test suite passes out of the box on 32-bit ARMv7 (I tried it on my RaspberryPi Model B 2). Author: Wes McKinney <wesm@apache.org> Closes apache#74 from wesm/PARQUET-488 and squashes the following commits: 61225e9 [Wes McKinney] Use -march=native 3833efd [Wes McKinney] Remove stale cmake comment 70fcf65 [Wes McKinney] Add cmake PARQUET_USE_SSE option 775c72d [Wes McKinney] Fix compilation on arm7/raspberrypi
…e#74) * Temporarily matching what the dremio does for mod zero. * Used the latest Arrow APIs for allocating buffers.
I added an option to make SSE strictly opt-in for now. As a side effect of this, parquet-cpp now builds and the test suite passes out of the box on 32-bit ARMv7 (I tried it on my RaspberryPi Model B 2). Author: Wes McKinney <wesm@apache.org> Closes apache#74 from wesm/PARQUET-488 and squashes the following commits: 61225e9 [Wes McKinney] Use -march=native 3833efd [Wes McKinney] Remove stale cmake comment 70fcf65 [Wes McKinney] Add cmake PARQUET_USE_SSE option 775c72d [Wes McKinney] Fix compilation on arm7/raspberrypi Change-Id: If8e4e7e1b7fc64df952cb8b82662bb017ca56f72
I added an option to make SSE strictly opt-in for now. As a side effect of this, parquet-cpp now builds and the test suite passes out of the box on 32-bit ARMv7 (I tried it on my RaspberryPi Model B 2). Author: Wes McKinney <wesm@apache.org> Closes apache#74 from wesm/PARQUET-488 and squashes the following commits: 61225e9 [Wes McKinney] Use -march=native 3833efd [Wes McKinney] Remove stale cmake comment 70fcf65 [Wes McKinney] Add cmake PARQUET_USE_SSE option 775c72d [Wes McKinney] Fix compilation on arm7/raspberrypi Change-Id: If8e4e7e1b7fc64df952cb8b82662bb017ca56f72
I added an option to make SSE strictly opt-in for now. As a side effect of this, parquet-cpp now builds and the test suite passes out of the box on 32-bit ARMv7 (I tried it on my RaspberryPi Model B 2). Author: Wes McKinney <wesm@apache.org> Closes apache#74 from wesm/PARQUET-488 and squashes the following commits: 61225e9 [Wes McKinney] Use -march=native 3833efd [Wes McKinney] Remove stale cmake comment 70fcf65 [Wes McKinney] Add cmake PARQUET_USE_SSE option 775c72d [Wes McKinney] Fix compilation on arm7/raspberrypi Change-Id: If8e4e7e1b7fc64df952cb8b82662bb017ca56f72
I added an option to make SSE strictly opt-in for now. As a side effect of this, parquet-cpp now builds and the test suite passes out of the box on 32-bit ARMv7 (I tried it on my RaspberryPi Model B 2). Author: Wes McKinney <wesm@apache.org> Closes apache#74 from wesm/PARQUET-488 and squashes the following commits: 61225e9 [Wes McKinney] Use -march=native 3833efd [Wes McKinney] Remove stale cmake comment 70fcf65 [Wes McKinney] Add cmake PARQUET_USE_SSE option 775c72d [Wes McKinney] Fix compilation on arm7/raspberrypi Change-Id: If8e4e7e1b7fc64df952cb8b82662bb017ca56f72
…e#74) * Temporarily matching what the dremio does for mod zero. * Used the latest Arrow APIs for allocating buffers.
…e#74) * Temporarily matching what the dremio does for mod zero. * Used the latest Arrow APIs for allocating buffers.
[Java] compression workaround