Performance concerns with ReinterpretArray #25014

timholy · 2017-12-10T17:21:41Z

Ref https://discourse.julialang.org/t/big-overhead-with-the-new-lazy-reshape-reinterpret/7635

ExpandingMan · 2018-02-24T14:16:47Z

Still seeing this problem.

I’m getting a median time of 2.6 ns for accessing an array created with unsafe_wrap, and a median time of 10.0 ns for a reinterpreted array.

From what I gather on discourse it seems that there are very few people with the expertise to tackle this.

When I originally wrote the new ReinterpretArray code, I made sure that LLVM was able to optimize reinterpret(::Array) back to a single memory access with appropriate TBAA and alignment info. Somewhere along the line LLVM lost that ability. While we should try to recover that capability in LLVM, that showed that that is a relatively brittle optimization for a very simple operation. So this patch takes a different approach: We add two new intrinsics `tbaa_pointerref` and `tbaa_pointerset` that behave like their non-TBAA variants, but additionally take a type to use as the TBAA tag. This allows us to write a special case for `reinterpret(T, ::Array)` that directly emits the correct pointer access. It's also a model for what a post-1.0 pure Julia implementation of `Array` (e.g. on top of a buffer type) may look like. Fixes #25014

This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a new intrinsic that puts an actual llvm.memcpy into the IR, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop.

This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a ccall to :memcpy and turn this into llvm.memcpy at the IR level, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop.

This fixes #25014 by making it more obvious what's going on to LLVM. Instead of a memcpy loop, we use a ccall to :memcpy and turn this into llvm.memcpy at the IR level, which is enough for LLVM to fold everything away. In the benchmark from #25014, we still see some regressions from 0.6, but that is because it needs to dereference through the pointers in the reinterpret and reshape wrappers. In any real code, that dereferencing should be loop-invariantly moved out of the inner loop. (cherry picked from commit 777810b)

kmsquire assigned Keno Dec 10, 2017

timholy mentioned this issue Dec 27, 2017

Rewrite for julia 0.7 JuliaImages/ImageCore.jl#52

Merged

ExpandingMan mentioned this issue Feb 24, 2018

Overhauled to Arrow Back-End and Better Memory Safety JuliaData/Feather.jl#78

Merged

andyferris mentioned this issue May 22, 2018

Reinterpret an Array of Float64 as an Array of SVector{Float64} ? JuliaArrays/StaticArrays.jl#410

Closed

Keno mentioned this issue May 23, 2018

Improve reinterpret #27213

Closed

Keno mentioned this issue Aug 16, 2018

Fix reinterpret performance #28707

Merged

Keno closed this as completed in #28707 Aug 17, 2018

RalphAS mentioned this issue Aug 31, 2018

Performance of ReinterpretArray, continued #28980

Closed

Moelf mentioned this issue Sep 13, 2021

ntoh / bswap are 10x slower when operating in-place on reinterpret array #42227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance concerns with ReinterpretArray #25014

Performance concerns with ReinterpretArray #25014

timholy commented Dec 10, 2017

ExpandingMan commented Feb 24, 2018

Performance concerns with ReinterpretArray #25014

Performance concerns with ReinterpretArray #25014

Comments

timholy commented Dec 10, 2017

ExpandingMan commented Feb 24, 2018