-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Draft Release Notes for Halide 17 Release
Andrew Adams edited this page Dec 14, 2023
·
2 revisions
(This is temporary wiki page for sharing / commenting / editing of release notes)
-
ParamMap
has been removed entirely from the public API. All users ofParamMap
should migrate toCallable
instead. -
Halide::Parameter
has been moved to the public Halide API (it was formerly "internal" and not intended for public use). - New scheduling primitives:
-
Func::partition()
and friends: Set the loop partition policy, which controls how/whether a loop is split into three loops (prologue/steady-state/epilogue). Loop partitioning can be useful to optimize boundary conditions (e.g. clamp_edge). -
Func::hoist_storage()
and friends: allows a functions's storage to be moved to a given loop level. UnlikeFunc::store_at()
, no optimizations are triggered (e.g. sliding window).
-
- New
TailStrategy
options for for existing scheduling directives:-
ShiftInwardsAndBlend
: Equivalent to ShiftInwards, but protects values that would be re-evaluated by loading the memory location that would be stored to, modifying only the elements not contained within the overlap, and then storing the blended result. Unlike ShiftInwards, this is valid to use in update definitions. -
RoundUpAndBlend
: Equivalent to RoundUp, but protects values that would be written beyond the end by loading the memory location that would be stored to, modifying only the elements within the region being computed, and then storing the blended result. Unlike RoundUp, this is valid to use on non-outermost splits in update definitions.
-
- Substantially improved performance and display in the VizIR output.
- Profiler improvements:
- Substantially nicer text output
- Injects timing into calls for
copy_to_host
andcopy_to_device
so you can measure host<->device copy overhead - Allows option sorting via
HL_PROFILER_SORT
env var
- Substantially faster codegen for several GPU backends.
- Experimental serialization/deserialization feature allows for saving of Halide IR code.
- Various bug fixes and improvements in the
Anderson2021
autoscheduler. - Improved ARM codegen, including: better patterns for sdot/udot; improved shift/mul codegen.
- Support for Zen4 architecture in the x86 backend.
- Updates to the ONNX app.
- Various fixes and improvements to sliding-window and storage-folding.
- Improvements to slow gather operations for some x86 variants.
- Improvements to correctness for the
.async()
scheduling directive. - Improved codegen for float16 conversion, especially on x86.
- Several compile-time warnings of dubious usefulness disabled.
- WebAssembly codegen now defaults to assuming that saturating-float-to-int and sign-extension instructions sets are always available.
-
Target
now does some reality-checking that it doesn't contain obviously nonsensicalFeature
combinations
- Misc changes and fixes to RISCV codegen
- Revise LLVM fix to work when no V8 or WABT available by @steven-johnson in https://github.com/halide/Halide/pull/7635
- Be more careful about overflow in trim_bounds_using_alignment by @abadams in https://github.com/halide/Halide/pull/7645
- Add a compositing example app by @abadams in https://github.com/halide/Halide/pull/7646
- Get the ASAN toolchain working again by @steven-johnson in https://github.com/halide/Halide/pull/7604
- Upgrade clang-format and clang-tidy to use v16 by @steven-johnson in https://github.com/halide/Halide/pull/7660
- Enable the misc-use-anonymous-namespace clang-tidy check by @steven-johnson in https://github.com/halide/Halide/pull/7661
- Enable clang-tidy's modernize-use-default-member-init check by @steven-johnson in https://github.com/halide/Halide/pull/7662
- Update onnx app to Adams2019 autoscheduler and new autoscheduler API by @abadams in https://github.com/halide/Halide/pull/7673
- Remove ParamMap by @steven-johnson in https://github.com/halide/Halide/pull/7675
- Fix correctness_float16_t for ASAN builds by @steven-johnson in https://github.com/halide/Halide/pull/7687
- Add a select overload for tuples by @abadams in https://github.com/halide/Halide/pull/7672
- Add Sanitizer details to README_cmake.md by @steven-johnson in https://github.com/halide/Halide/pull/7688
- Fix quadratic algorithm in simplify_correlated_differences by @abadams in https://github.com/halide/Halide/pull/7686
- Fix float16 under asan, attempt #2 by @steven-johnson in https://github.com/halide/Halide/pull/7691
- Add a warning if a Generator declares any Outputs before the final Input (Fixes #7669) by @steven-johnson in https://github.com/halide/Halide/pull/7697
- Fixed the regularization for BGU. by @mcourteaux in https://github.com/halide/Halide/pull/7684
- Fix clang and llvm versions in scripts by @TH3CHARLie in https://github.com/halide/Halide/pull/7702
- Fix leaks caused by self-referential parameter constraints by @abadams in https://github.com/halide/Halide/pull/7700
- Fix float16 warning for older clangs by @abadams in https://github.com/halide/Halide/pull/7701
- Upgrade Halide main branch for LLVM18 by @steven-johnson in https://github.com/halide/Halide/pull/7710
- Improved profiler result printing. by @mcourteaux in https://github.com/halide/Halide/pull/7709
- Default WITH_TEST_FUZZ to OFF by @steven-johnson in https://github.com/halide/Halide/pull/7695
- Throw an erorr if split is called with the same older and inner var name by @TH3CHARLie in https://github.com/halide/Halide/pull/7715
- Making HLSL code-gen a couple orders of magnitude faster... by @slomp in https://github.com/halide/Halide/pull/7719
- Making Metal code-gen a bit faster by @slomp in https://github.com/halide/Halide/pull/7720
- Fix handling of thread features for scalars in Anderson2021 by @aekul in https://github.com/halide/Halide/pull/7726
- Change default generator timeout to infinite by @abadams in https://github.com/halide/Halide/pull/7718
- Remove unused using decl by @abadams in https://github.com/halide/Halide/pull/7730
- [Hexagon] - Fix problems in sim_host.cpp by @pranavb-ca in https://github.com/halide/Halide/pull/7725
- Fix RDom usage in anderson2021_test_apps_autoscheduler (Fixes #7729) by @steven-johnson in https://github.com/halide/Halide/pull/7734
- Fix leak on cloning functions with update defs by @abadams in https://github.com/halide/Halide/pull/7735
- Ignore code in src/runtime/hexagon_remote/bin/src for clang-format by @steven-johnson in https://github.com/halide/Halide/pull/7736
- Clean up really long line lengths in Anderson2021 by @steven-johnson in https://github.com/halide/Halide/pull/7728
- Revise labels on autoscheduler tests by @steven-johnson in https://github.com/halide/Halide/pull/7732
- Speedup the VizIR HTML. by @mcourteaux in https://github.com/halide/Halide/pull/7713
- Run clang-tidy on macOS runners instead of Linux by @steven-johnson in https://github.com/halide/Halide/pull/7746
- Fix infinite recursion in loop partitioning by @abadams in https://github.com/halide/Halide/pull/7743
- Fix leaks in test/correctness/memoize.cpp by @abadams in https://github.com/halide/Halide/pull/7705
- Allow optional sorting of profiler output via HL_PROFILER_SORT env var (Fixes #7638) by @steven-johnson in https://github.com/halide/Halide/pull/7639
- Permit llvm 15 on windows by @abadams in https://github.com/halide/Halide/pull/7744
- Revert accidental typo change in #7746 by @steven-johnson in https://github.com/halide/Halide/pull/7747
- [vulkan] Fix heap buffer overflow in Vulkan extension handling discovered by ASAN by @derek-gerstmann in https://github.com/halide/Halide/pull/7740
- [vulkan] Fix SPIR-V IR references causing leaks by @derek-gerstmann in https://github.com/halide/Halide/pull/7739
- Improve error-handling in Anderson2021, and ensure build deps are cor… by @steven-johnson in https://github.com/halide/Halide/pull/7748
- StmtViz: Search for tooltip only in the child node by @antonysigma in https://github.com/halide/Halide/pull/7754
- Experimental serializer by @TH3CHARLie in https://github.com/halide/Halide/pull/7594
- Define
cast<i32>(u32)
overflow behavior by @rootjalex in https://github.com/halide/Halide/pull/7769 - Fix vector reduce HTML by @mcourteaux in https://github.com/halide/Halide/pull/7773
- Remove fragile simd_op_check test for mlal/mlsl on ARM by @rootjalex in https://github.com/halide/Halide/pull/7775
- Speedup page loading of VizStmt. by @mcourteaux in https://github.com/halide/Halide/pull/7755
- Try to fix remaining ASAN-reported leaks by @steven-johnson in https://github.com/halide/Halide/pull/7767
- Fix out of bounds access in anderson2021_test_apps_autoscheduler by @aekul in https://github.com/halide/Halide/pull/7771
- Don't introduce reinterprets in find/lower intrinsics by @rootjalex in https://github.com/halide/Halide/pull/7776
- [Hexagon] -Build Hexagon runtime components using the Hexagon SDK (Clone of #7671) by @pranavb-ca in https://github.com/halide/Halide/pull/7741
- slice IRMatcher should only match on slices by @abadams in https://github.com/halide/Halide/pull/7772
- Don't inject undef() in the simplifier by @abadams in https://github.com/halide/Halide/pull/7791
- Fix for top-of-tree LLVM by @steven-johnson in https://github.com/halide/Halide/pull/7798
- [ARM] Distribute shifts as muls by @rootjalex in https://github.com/halide/Halide/pull/7790
- [ARM] support new udot/sdot patterns by @rootjalex in https://github.com/halide/Halide/pull/7800
- Remove some unused includes by @abadams in https://github.com/halide/Halide/pull/7799
- Add support to the makefile for serialization by @abadams in https://github.com/halide/Halide/pull/7762
- [wasm] Enable PIC for WebAssembly on LLVM v18.x by @derek-gerstmann in https://github.com/halide/Halide/pull/7803
- Update WebGPU to latest Emscripten/Dawn API by @steven-johnson in https://github.com/halide/Halide/pull/7804
- Add jump-buttons to get fro Stmt directly to Assembly by @mcourteaux in https://github.com/halide/Halide/pull/7793
- Update clang-tidy action to stop breaking by @steven-johnson in https://github.com/halide/Halide/pull/7808
- [serialization] Add serialization support to generator interface by @derek-gerstmann in https://github.com/halide/Halide/pull/7792
- Ensure that multitarget AOT builds have consistent random sequence by @steven-johnson in https://github.com/halide/Halide/pull/7717
- Move clang-tidy checks back to Linux by @steven-johnson in https://github.com/halide/Halide/pull/7817
- Update 'Check CMake file lists' action by @steven-johnson in https://github.com/halide/Halide/pull/7809
- Remove dead
auto-schedule
label in CMake by @steven-johnson in https://github.com/halide/Halide/pull/7818 - Don't return an undefined Stmt() from IfThenElse visitor by @abadams in https://github.com/halide/Halide/pull/7816
- Avoid generating name collisions in CSE by @abadams in https://github.com/halide/Halide/pull/7821
- Add a check that PredicateLoads must be used in the outermost split of a dimension by @TH3CHARLie in https://github.com/halide/Halide/pull/7788
- Enable emission of float16/32 casts on x86 by @abadams in https://github.com/halide/Halide/pull/7837
- Iterate over lets in the correct order in VectorizeLoops by @vksnk in https://github.com/halide/Halide/pull/7830
- Zen4 support by @abadams in https://github.com/halide/Halide/pull/7840
- Update arguments in driver.cpp to match what correctness/simd_op_check has by @vksnk in https://github.com/halide/Halide/pull/7842
- [tutorials] Add tutorial on JIT compile/execute performance by @derek-gerstmann in https://github.com/halide/Halide/pull/7838
- [api] Promote Internal::Parameter to Halide::Parameter by @derek-gerstmann in https://github.com/halide/Halide/pull/7829
- [Hexagon] - Fix 8-bit unsigned saturating downcasts for HVX (Fixes #7806) by @pranavb-ca in https://github.com/halide/Halide/pull/7825
- Handle nested vectorization in store predicates by @abadams in https://github.com/halide/Halide/pull/7864
- Respect input buffer constraints in root-level bounds inference exprs by @abadams in https://github.com/halide/Halide/pull/7865
- Prevent use of uninitialized scalar Parameters in JIT code (#7847, partial) by @steven-johnson in https://github.com/halide/Halide/pull/7853
- Handle unreachable code in bounds inference by @abadams in https://github.com/halide/Halide/pull/7866
- [serialization] Add support to serialize to memory, and a basic serialization tutorial by @derek-gerstmann in https://github.com/halide/Halide/pull/7760
- Don't deduce unreachability from predicated out of bounds stores by @abadams in https://github.com/halide/Halide/pull/7874
- Validate for types when fusing Vars with RVars by @abadams in https://github.com/halide/Halide/pull/7877
- Consider all dimensions before deciding to slide over a new dimension by @abadams in https://github.com/halide/Halide/pull/7875
- Update onnx app to work with newer versions of protobuf by @abadams in https://github.com/halide/Halide/pull/7879
- HTML Stmt IR with conceptual code and device code. by @mcourteaux in https://github.com/halide/Halide/pull/7843
- Update README.md to include RISCV in llvm build instructions by @abadams in https://github.com/halide/Halide/pull/7878
- Implement elementwise complex value division by @antonysigma in https://github.com/halide/Halide/pull/7848
- Explicitly name the allocgroups on GPU schedules "allocgroup__..." by @mcourteaux in https://github.com/halide/Halide/pull/7883
- Generate simpler LLVM IR for shuffles that recursively become broadcasts by @abadams in https://github.com/halide/Halide/pull/7902
- Check for overflow in Type constructor by @abadams in https://github.com/halide/Halide/pull/7889
- Mutating if branches in isolation can break reachability analysis by @abadams in https://github.com/halide/Halide/pull/7895
- Disable warning for mismatched new/delete by @abadams in https://github.com/halide/Halide/pull/7897
- Assignment is not associative by @abadams in https://github.com/halide/Halide/pull/7894
- Don't lift loop vars outside of their loops in sliding window by @abadams in https://github.com/halide/Halide/pull/7896
- Stop interleaver from expanding the scope of letstmts by @abadams in https://github.com/halide/Halide/pull/7908
- Highlight groups for the HTML Stmt file and tooltips to reveal types. by @mcourteaux in https://github.com/halide/Halide/pull/7887
- Static analysis (MSVC) fixes for device_buffer_utils.h by @slomp in https://github.com/halide/Halide/pull/7904
- Check returned result in the test by @vksnk in https://github.com/halide/Halide/pull/7911
- Fix read-after-write hazard analysis in storage folding by @abadams in https://github.com/halide/Halide/pull/7910
- Turn off SLP vectorization for avx512 only by @abadams in https://github.com/halide/Halide/pull/7918
- Scheduling directive to hoist the storage of the function by @vksnk in https://github.com/halide/Halide/pull/7915
- Improve the error message if you store_at without a compute_at by @vksnk in https://github.com/halide/Halide/pull/7923
- Loop Partitioning Policy through Stage::partition(VarOrRVar, LoopPartitionPolicy) by @mcourteaux in https://github.com/halide/Halide/pull/7914
- Remove use of dynamic_cast. by @zvookin in https://github.com/halide/Halide/pull/7931
- Add special build for testing serialization via a serialization roundtrip in JIT compilation and fix serialization leaks by @TH3CHARLie in https://github.com/halide/Halide/pull/7763
- Add missing serialization of Dim::partition_policy by @TH3CHARLie in https://github.com/halide/Halide/pull/7935
- Make sure all Halide arithmetic scalar types can be named from the Generator interface. by @zvookin in https://github.com/halide/Halide/pull/7934
- Remove the deprecated API
llvm::Type::getInt8PtrTy
usage. by @hokein in https://github.com/halide/Halide/pull/7937 - More targeted fix for gather instructions being slow on intel processors by @abadams in https://github.com/halide/Halide/pull/7945
- Track likely values through lets in loop partitioning by @abadams in https://github.com/halide/Halide/pull/7930
- Add missing condition to if renesting rule by @abadams in https://github.com/halide/Halide/pull/7952
- Always call lower_round_to_nearest_ties_to_even on arm32 by @vksnk in https://github.com/halide/Halide/pull/7957
- Improve code size and compile time for local laplacian app by @abadams in https://github.com/halide/Halide/pull/7927
- [serialization] Serialize stub definitions of external parameters. by @derek-gerstmann in https://github.com/halide/Halide/pull/7926
- [WebGPU] Update to latest native headers by @jrprice in https://github.com/halide/Halide/pull/7932
- Return values from stub functions in Deserialization by @steven-johnson in https://github.com/halide/Halide/pull/7963
- Make the fast inverse test throughput-limited rather than latency-limited by @abadams in https://github.com/halide/Halide/pull/7958
- Attempt to fix nested vectorization gemm performance on new build bot by @abadams in https://github.com/halide/Halide/pull/7959
- Update instructions to include generated schedules by @antonysigma in https://github.com/halide/Halide/pull/7928
- [serialization] Add Halide version and serialization version in serialization format by @TH3CHARLie in https://github.com/halide/Halide/pull/7905
- Handle many more intrinsics in Bounds.cpp by @steven-johnson in https://github.com/halide/Halide/pull/7823
- Disallow async nestings that violate read after write dependencies by @abadams in https://github.com/halide/Halide/pull/7868
- complete_x86_target() should enable F16C and FMA when AVX2 is present by @steven-johnson in https://github.com/halide/Halide/pull/7971
- Add two new tail strategies for update definitions by @abadams in https://github.com/halide/Halide/pull/7949
- Add appropriate mattrs for arm-32 extensions by @abadams in https://github.com/halide/Halide/pull/7978
- Move canonical version numbers into source, not build system (#7980) by @steven-johnson in https://github.com/halide/Halide/pull/7981
- Silence useless "Insufficient parallelism" autoscheduler warning by @steven-johnson in https://github.com/halide/Halide/pull/7990
- Add a notebook with a visualization of the aprrox_* functions and their errors by @vksnk in https://github.com/halide/Halide/pull/7974
- Make narrowing float->int casts on wasm go via wider ints by @abadams in https://github.com/halide/Halide/pull/7973
- Fix handling of assert statements whose conditions get vectorized by @abadams in https://github.com/halide/Halide/pull/7989
- Fix all "unscheduled update()" warnings in our code by @steven-johnson in https://github.com/halide/Halide/pull/7991
- Silence useless 'Outer dim vectorization of var' warning in Mullapudi… by @steven-johnson in https://github.com/halide/Halide/pull/7992
- Make wasm +sign-ext and +nontrapping-fptoint the default by @steven-johnson in https://github.com/halide/Halide/pull/7995
- Teach unrolling to exploit conditions in enclosing ifs by @abadams in https://github.com/halide/Halide/pull/7969
- Do some basic validation of Target Features (#7986) by @steven-johnson in https://github.com/halide/Halide/pull/7987
- Inject profiling for function calls to 'halide_copy_to_host' and 'halide_copy_to_device'. by @mcourteaux in https://github.com/halide/Halide/pull/7913
- @antonysigma made their first contribution in https://github.com/halide/Halide/pull/7754
- @hokein made their first contribution in https://github.com/halide/Halide/pull/7937
Full Changelog: https://github.com/halide/Halide/compare/v16.0.0...v17.0.0