In #43852 we noticed that the compiler is getting good enough to
completely DCE a number of our benchmarks. We need to add some sort
of mechanism to prevent the compiler from doing so. This adds just
such an intrinsic. The intrinsic itself doesn't do anything, but
it is considered effectful by our optimizer, preventing it from
being DCE'd. At the LLVM level, it turns into a volatile store to
an alloca (or an llvm.sideeffect if the values passed to the
`dcebarrier` do not have any actual LLVM-level representation).
The docs for the new intrinsic are as follows:
```
dcebarrier(args...)
This function prevents dead-code elimination (DCE) of itself and any arguments
passed to it, but is otherwise the lightest barrier possible. In particular,
it is not a GC safepoint, does model an observable heap effect, does not expand
to any code itself and may be re-ordered with respect to other side effects
(though the total number of executions may not change).
A useful model for this function is that it hashes all memory `reachable` from
args and escapes this information through some observable side-channel that does
not otherwise impact program behavior. Of course that's just a model. The
function does nothing and returns `nothing`.
This is intended for use in benchmarks that want to guarantee that `args` are
actually computed. (Otherwise DCE may see that the result of the benchmark is
unused and delete the entire benchmark code).
**Note**: `dcebarrier` does not affect constant foloding. For example, in
`dcebarrier(1+1)`, no add instruction needs to be executed at runtime and
the code is semantically equivalent to `dcebarrier(2).`
*# Examples
function loop()
for i = 1:1000
# The complier must guarantee that there are 1000 program points (in the correct
# order) at which the value of `i` is in a register, but has otherwise
# total control over the program.
dcebarrier(i)
end
end
```
I believe the voltatile store at the LLVM level is actually somewhat
stronger than what we want here. Ideally the `dcebarrier` would not
and up generating any machine code at all and would also be compatible
with optimizations like SROA and vectorization. However, I think this
is fine for now.