Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Commit

Permalink
Extending optimized JIT helpers to Buffer.MemoryCopy (#9786)
Browse files Browse the repository at this point in the history
  • Loading branch information
vkvenkat authored and jkotas committed Mar 4, 2017
1 parent d22a221 commit c6372c5
Showing 1 changed file with 151 additions and 290 deletions.
Loading

1 comment on commit c6372c5

@glenn-slayden
Copy link

@glenn-slayden glenn-slayden commented on c6372c5 Feb 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why all these improvements to Buffer.Memmove (plus more recent work) aren't deployed in the IL cpblk instruction instead? It appears that this commit c6372c5 removes a comment to the effect that cpblk is where all this effort naturally belongs (that is, in case the proliferation of conditional #if code here wasn't already somewhat suggestive), and I agree.

I know everyone wants there to be one central nexus for obtaining an optimal memmove, with proper alignment, chunking, tuning, and everything else done right; shouldn't cpblk be that place?

I also note that this commit removes all address alignment behavior. Now while it's true that the alignment code in ea9bee5 was a bit meek in only considering the destination address--at the possible expense of source alignment:

if (((int)dest & 3) != 0)
{
if (((int)dest & 1) != 0)
{
*(dest + i) = *(src + i);
i += 1;
if (((int)dest & 2) != 0)
goto IntAligned;
}
*(short*)(dest + i) = *(short*)(src + i);
i += 2;
}
IntAligned:
#if BIT64
// On 64-bit IntPtr.Size == 8, so we want to advance to the next 8-aligned address. If
// (int)dest % 8 is 0, 5, 6, or 7, we will already have advanced by 0, 3, 2, or 1
// bytes to the next aligned address (respectively), so do nothing. On the other hand,
// if it is 1, 2, 3, or 4 we will want to copy-and-advance another 4 bytes until
// we're aligned.
// The thing 1, 2, 3, and 4 have in common that the others don't is that if you
// subtract one from them, their 3rd lsb will not be set. Hence, the below check.
if ((((int)dest - 1) & 4) == 0)
{
*(int*)(dest + i) = *(int*)(src + i);
i += 4;
}
#endif // BIT64

...at least this was something, and a reasonably motivated approach since misaligned store penalties are typically higher than for read.

But in this commit there's neither. Did the experiments in #9786 empirically determine that physically aligning these operations is always a waste of time? For severe misalignment between source/target, it would be possible to write a split-access loop, albeit complex, that marshals quad-aligned reads to quad-aligned writes. Was this type of thing evaluated before taking the decision to abandon all efforts at alignment?

Finally, note that by insisting on alignment to the destination address exclusively, the alignment code (that was removed in this commit, see above) may have been ruining a naturally ideal source alignment, with the overall result of making things worse. This could have unfairly penalized test results for the existing code, when it would be a trivial matter to only do that adjustment when src/dest are jointly aligned to some appropriate level, e.g., wrap the above code with:

if ((((int)src ^ (int)dest) & 0x7) == 0)        // (or some other alignment value)
{
    // ...

As far as I can tell, there has never been code in buffer.cs that considers the source and destination physical alignment jointly, in order to then proceed sensibly on the basis of 3 broadly defined cases:

  • Input src/dst physical addresses already jointly aligned (or len less than max align size), proceed as-is;
  • Joint alignment can be improved (to some degree) prior to starting the gang-operation (code removed in c6372c5)
  • Antagonistic src/dst alignment detected; either proceed as-is, or, if worthwhile, use split-access loop to fully remediate.

Please sign in to comment.