v0.3.4
AccurateArithmetic v0.3.4
This release introduces a @fusible
counterpart to SIMDops.@explicit
, which allows using SIMD instructions from SIMDPirates.jl
in all cases, whether exact instructions are wanted, or fused operations are allowed to happen. EFTs now use this when possible. The API for @explicit
changed: it now affects the following expressions (instead of the whole scope inside which it is placed).
This release also adds support for cache prefetching, which significantly improves performance for large vectors. This was actually the explanation for the performance discrepancies observed for large vectors and mentioned in #7. Default values for the cache prefetching mechanism are probably not optimal for all architectures. Users interested in the last 10% performance outside the cache are invited to customize this parameter.
Closed issues:
- Naive dot product performance (#7)
Merged pull requests: