ARROW-185: Make padding and alignment for all buffers be 64 bytes #74

emkornfield · 2016-05-11T09:39:43Z

some small cleanup/removal of unnecessary code. I think there is likely a good opportunity to factor this code better generally, but this seems to work for now.

emkornfield · 2016-05-11T09:40:56Z

cpp/src/arrow/util/buffer.cc

+  constexpr int64_t multiple_bitmask = round_to - 1;
+  int64_t remainder = num & multiple_bitmask;
+  int rounded = num;
+  if (remainder) { rounded += 64 - remainder; }


should use round_to here. I'm also pretty sure there is something clever we could do to avoid the condition here, but at the moment I'm blanking on it.

Does this do it?

(num + multiple_bitmask) & ~multiple_bitmask

that looks right to me. although the performance gains are probably moot given the other condition for overflow.

emkornfield · 2016-05-11T16:24:58Z

hmm, tests passed locally, will need to take a closer look at what is going on.

wesm · 2016-05-12T02:28:34Z

cpp/src/arrow/util/buffer.h

  virtual ~Buffer();

  // An offset into data that is owned by another buffer, but we want to be
  // able to retain a valid pointer to it even after other shared_ptr's to the
  // parent buffer have been destroyed
+  // TODO(emkornfield) how will this play with 64 byte alignment/padding?


Inevitably alignment and padding isn't always going to be a guarantee on in-memory data (of course when data is moved for IPC purposes, that will need to be guaranteed). I suppose then that buffers will need to be able to communicate their alignment/padding for algorithm selection (i.e. can we use the spiffy AVX512 function or not?)

I think we need to see how use-cases play out. It seems given the current spec, most slicing operations in the general case will need memory allocation anyways. We could likely guarantee alignment/padding by providing a utility method that either allocates slices if it can keep the contract otherwise allocates new underlying data. For now I will put a warning here.

emkornfield · 2016-05-13T08:02:11Z

still need to address other comments, but pushed a commit that should allow C++ tests to pass, I still need to check if python tests are still failing.

emkornfield · 2016-05-17T16:47:11Z

should be ready for review. Not done here is verification of alignment on RPC I will open up a jira to address this, if that is ok.

wesm · 2016-05-17T23:39:59Z

cpp/src/arrow/util/buffer.h

+  //
+  // This method makes no assertions about alignment or padding of the buffer but
+  // in general we expected buffers to be aligned and padded to 64 bytes.  In the future
+  // we might add utility methods to help determine if a buffer satisfies this contract.


Probably what we can do is add a method to produce a buffer that is guaranteed to be aligned and padded (allocating as necessary). For example: if there is incoming data from another library to libarrow that is not aligned or padded, some algorithms may work without alignment or padding, while others (e.g. requiring SIMD) would require the buffer to be "fixed". This could get pretty hairy, though...

I'm thinking about the case where an Arrow array is constructed from memory allocated elsewhere with zero copy

wesm · 2016-05-17T23:42:03Z

LGTM. thank you for the thorough efforts on this. +1

…e#74) * Temporarily matching what the dremio does for mod zero. * Used the latest Arrow APIs for allocating buffers.

I added an option to make SSE strictly opt-in for now. As a side effect of this, parquet-cpp now builds and the test suite passes out of the box on 32-bit ARMv7 (I tried it on my RaspberryPi Model B 2). Author: Wes McKinney <wesm@apache.org> Closes apache#74 from wesm/PARQUET-488 and squashes the following commits: 61225e9 [Wes McKinney] Use -march=native 3833efd [Wes McKinney] Remove stale cmake comment 70fcf65 [Wes McKinney] Add cmake PARQUET_USE_SSE option 775c72d [Wes McKinney] Fix compilation on arm7/raspberrypi

…e#74) * Temporarily matching what the dremio does for mod zero. * Used the latest Arrow APIs for allocating buffers.

I added an option to make SSE strictly opt-in for now. As a side effect of this, parquet-cpp now builds and the test suite passes out of the box on 32-bit ARMv7 (I tried it on my RaspberryPi Model B 2). Author: Wes McKinney <wesm@apache.org> Closes apache#74 from wesm/PARQUET-488 and squashes the following commits: 61225e9 [Wes McKinney] Use -march=native 3833efd [Wes McKinney] Remove stale cmake comment 70fcf65 [Wes McKinney] Add cmake PARQUET_USE_SSE option 775c72d [Wes McKinney] Fix compilation on arm7/raspberrypi Change-Id: If8e4e7e1b7fc64df952cb8b82662bb017ca56f72

…e#74) * Temporarily matching what the dremio does for mod zero. * Used the latest Arrow APIs for allocating buffers.

[Java] compression workaround

…ache#73) (apache#74) Co-authored-by: Hongze Zhang <hongze.zhang@intel.com>

ARROW-185: Make padding and alignment for all buffers be 64 bytes

6ff3048

emkornfield reviewed May 11, 2016
View reviewed changes

wesm reviewed May 12, 2016
View reviewed changes

add back in memsets because they make valgrind happy

05653cb

emkornfield changed the title ~~ARROW-185: Make padding and alignment for all buffers be 64 bytes~~ [WIP] ARROW-185: Make padding and alignment for all buffers be 64 bytes May 13, 2016

emkornfield added 5 commits May 16, 2016 08:56

replace cython string conversion with string builder

11b3fd7

cleanup

7543267

fix lint

c140e04

fix warning

1d006d8

fix cast style

e3cca14

emkornfield changed the title ~~[WIP] ARROW-185: Make padding and alignment for all buffers be 64 bytes~~ ARROW-185: Make padding and alignment for all buffers be 64 bytes May 17, 2016

wesm reviewed May 17, 2016
View reviewed changes

asfgit closed this in 9c59158 May 17, 2016

xuechendi pushed a commit to xuechendi/arrow that referenced this pull request Aug 4, 2020

Merge pull request apache#74 from rongma1997/native-compression

f8b201f

[Java] compression workaround

emkornfield deleted the emk_fix_allocations_PR branch February 26, 2021 05:14

zhouyuan added a commit to zhouyuan/arrow that referenced this pull request Jan 10, 2022

Fix wrong Tell() result from BufferedOutputStream in an edge case (ap…

21698d7

…ache#73) (apache#74) Co-authored-by: Hongze Zhang <hongze.zhang@intel.com>

paleolimbot mentioned this pull request Jan 28, 2023

[R] Crash on MacOS (x86) when running tests with homebrew apache-arrow also installed #33903

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-185: Make padding and alignment for all buffers be 64 bytes #74

ARROW-185: Make padding and alignment for all buffers be 64 bytes #74

emkornfield commented May 11, 2016

emkornfield May 11, 2016

wesm May 12, 2016

emkornfield May 17, 2016

emkornfield commented May 11, 2016

wesm May 12, 2016

emkornfield May 17, 2016

emkornfield commented May 13, 2016

emkornfield commented May 17, 2016

wesm May 17, 2016

wesm May 17, 2016

wesm commented May 17, 2016

ARROW-185: Make padding and alignment for all buffers be 64 bytes #74

ARROW-185: Make padding and alignment for all buffers be 64 bytes #74

Conversation

emkornfield commented May 11, 2016

emkornfield May 11, 2016

Choose a reason for hiding this comment

wesm May 12, 2016

Choose a reason for hiding this comment

emkornfield May 17, 2016

Choose a reason for hiding this comment

emkornfield commented May 11, 2016

wesm May 12, 2016

Choose a reason for hiding this comment

emkornfield May 17, 2016

Choose a reason for hiding this comment

emkornfield commented May 13, 2016

emkornfield commented May 17, 2016

wesm May 17, 2016

Choose a reason for hiding this comment

wesm May 17, 2016

Choose a reason for hiding this comment

wesm commented May 17, 2016