v0.14.0
Summary
This release marks the debut of our CubeCL integration, which brings cross-platform GPU programming capabilities directly to Rust.
With CubeCL now supporting both CUDA and WebGPU, Burn benefits from a new CUDA backend that can be enabled using the cuda-jit feature.
Please note that this backend is still considered experimental, and some operations, particularly those related to vision, may experience issues.
Additionally, this release features significant enhancements to ONNX support, including bug fixes, new operators, and improvements in code generation.
As always, it also includes numerous bug fixes, performance enhancements, new tensor operations, and improved documentation.
Burn 0.14.0 introduces a new tensor data format that significantly enhances serialization and deserialization speeds and introduces Quantization, a new Beta feature included in this release. The format is not compatible with previous versions of Burn, but you can migrate your previously saved records using this guide.
Module & Tensor
- (@laggui) Add RoPE init_with_frequency_scaling (#2194)
- (@laggui) Add 0-dim tensor checks for creation ops and validate TensorData shape w/ num values (#2137)
- (@wingertge): Add Hard sigmoid activation function (#2112)
- (@antimora): Add is_nan and contains_nan tensor ops (#2088)
- (@laggui) Convert compatible prelu weights to rank 1 (#2054)
- (@laggui) Refactor tensor quantization for q_* ops (#2025)
- (@RuelYasa): Adding burn::nn::Sigmoid (#2031)
- (@laggui) Module weight quantization (#2000)
- (@louisfd): Cube: Matmul tiling (#1994)
- (@antimora): Enhance slice operation to support more range variation (#1989)
- (@laggui) Add static tensor quantization (#1963)
- (@johnhuichen): Enable negative starts and ends for slice op (#1981)
- (@booti386): Implement 3D and transposed 3D convolutions. (#1945)
- (@antimora): Print module - implement module display for remaining modules (part2) (#1933)
- (@antimora): Print model structure like with PyTorch - Part 1 (#1912)
- (@DieracDelta): Tanh nn wrapper (#1903)
- (@laggui) Implement
Element
forbool
(#1878) - (@LilDojd) Feat: Add
movedim
tensor operator (#1876) - (@ArthurBrussee): Make autodiff compile on wasm (#1889)
- (@ArthurBrussee): Make Param.id public (#1859)
- (@kantic) Remainder operator (#1726)
- (@McArthur-Alford) Indices Operator (#1735)
- (@laggui) Add seq start position when applying RoPE encoding (#1796)
- (@JachymPutta): Adding max import (#1769)
- (@agelas): Feat/squeeze dims (#1779)
- (@wcshds) Implement bidirectional LSTM (#1035)
- (@agelas): Feat/remainder (#1597)
Bug Fixes
- (@laggui) Fix root-mean-square precision issue (#2193)
- (@laggui) Fix indices dim check in gather_update_outputs (#2149)
- (@antimora): Fix #2091 bug (in-place after expand) (#2114)
- (@laggui) Fix aggregation results slice (#2110)
- (@nathanielsimard): Fix: fusion auto bound checks (#2087)
- (@laggui) Extend [min, max] range to ensure zero-point (#2055)
- (@agelas): Bug/Remove Squeeze Panic for Multiple Dimensions (#2035)
- (@nathanielsimard): Fix wgsl remainder definition (#1979)
- (@laggui) Fix output tensor dtype (#1938)
- (@femshima): feat: Make RetroForward public (#1905)
- (@laggui) Fix conv2d_weight_grad_groups (#1891)
- (@nathanielsimard): Fix select assign backward (#1739)
- (@louisfd): Fix repeat for dims > 1 (#1713)
- (@nathanielsimard): Fix lstm batch size bug (#1695)
- (@antimora): Reshape bug fix (#1684)
- (@antimora) Fix bug: Filling tensor containing f32::NEG_INFINITY will result in NaN for burn-ndarray (#2095)
ONNX Support
- (@hexd0t): Allow ONNX scalar greater/less with scalar (#2146)
- (@hexd0t): Implement ONNX Gather for scalar indices (#2141)
- (@mepatrick73): feat: adding shape support for gather ONNX operation (#2128)
- (@mepatrick73): ONNX Tile operation (#2092)
- (@cBournhonesque): Add onnx mean (#2119)
- (@mepatrick73): Repeat operation (#2090)
- (@antimora): Add 1d and 2d modules for interpolate with scaling (also fix ONNX Resize op) (#2081)
- (@johnhuichen): Implement ONNX Pad Operator (#2007)
- (@hexd0t, @antimora): Implement ONNX ConstantOfShape (#1815)
- (@johnhuichen): Add subtract tensor from scalar for ONNX sub op (#1964)
- (@Dirleye): Add ReduceProd ONNX Import (#1955)
- (@JachymPutta) feat: added reduce min onnx import (#1894)
- (@mosure): feat: resize onnx import (#1863)
- (@JachymPutta) feat: added slice onnx import (#1856)
- (@skewballfox): Optimize argument handling and improve ONNX graph building (#1857)
- (@JachymPutta) feat: add sum onnx import (#1846)
- (@agelas): Feat/gather import (#1843)
- (@JachymPutta): feat: expand onnx import (#1813)
- (@JachymPutta): feat: added range onnx import (#1834)
- (@will-maclean): Feature/onnx argmax (#1814)
- (@hexd0t): Feat: Implement ONNX RandomUniform + RandomNormal in burn-import (#1806)
- (@JachymPutta): feat: Greater + GreaterOrEqual onnx import (#1801)
- (@JachymPutta): feat: Less + LessOrEqual onnx import (#1800)
- (@JachymPutta): feat: added min onnx import (#1778)
- (@agelas): Squeeze Onnx Import (#1753)
- (@Arjun31415): Added ONNX AvgPool1d (#1744)
- (@Arjun31415): Add MaxPool1d ONNX Op(#1725)
- (@AntBlo) Add reduce sum onnx ops to burn imports (#1723)
- (@Arjun31415): PReLu ONNX import (#1721)
- (@antimora): Update SUPPORTED-ONNX-OPS.md (#1717)
- (@antimora): ONNX debug improvements (#1712)
- (@antimora): Skip updating shape for linear if not present (#1700)
- (@laggui) Remove leaky relu ONNX file (#1697)
- (@antimora): ONNX support for scalar unsqueeze (#1690)
- (@laggui) Add layer norm onnx op support (#1680)
- (@antimora): Fix reshape bug (support for opset version 1) (#1667)
- (@wufniks) Add sign ONNX op import support (#1663)
- (@laggui) Add where onnx op support (#1653)
- (@laggui) Add matmul ONNX op support (#1638)
- (@laggui) Add reduce max ONNX op support (#1636)
- (@laggui) Add shape ONNX op support (#1639)
- (@laggui) [ONNX] Add not op and extend cast support to tensors (#1634)
- (@laggui) Add reduce mean ONNX op support (#1637)
- (@antimora): Update SUPPORTED-ONNX-OPS.md (#1641)
- (@laggui) Add sin onnx op support (#1633)
Bug Fixes
- (@mepatrick73) Tensor type indent fix (#2196)
- (@mepatrick73) pad-input-fix: adding support for pads as attributes (#2195)
- (@hexd0t) Fix ONNX Gather codegen for Shape input (#2148)
- (@mepatrick73): bug fix: adding bounds checking to pad ONNX inputs (#2120)
- (@laggui) Fix checks_channels_div_groups condition and ONNX conv import with groups (#2051)
- (@nathanielsimard): Support linear 1d (#1682)
- (@laggui) Fix ONNX and PyTorch import section links in burn book (#1681)
- (@antimora): Fix bug 1645 (Unsqueeze OpSet 11) (#1661)
- (@laggui) Fix transpose onnx op (permute) (#1657)
Enhancements
- (@laggui) Add scientific notation formatting for small metric values (#2136)
- (@ArthurBrussee): Always derive Cube features from adapter (#1958)
- (@mepatrick73, @nathanielsimard): Dynamic memory management preset + updated wgpu buffer memory management (#1962)
- (@mepatrick73): Feat/fixed chunk alloc by class (#1960)
- (@ArthurBrussee): Consistent sync/async handling, allow more functions to be async for wasm. (#1936)
- (@varonroy): Replaced
str
withPath
(#1919) - (@louisfd, @nathanielsimard): New autodiff graph memory management strategy (#1698)
- (@syl20bnr): Move HandleContainer and Tensor Ops descriptions from burn-fusion to burn-tensor (#1654)
- (@NicoZweifel) WindowDataset/windows function (#1553)
- (@antimora): Improve pickle (CandleTensor) conversions to NestedValue (#1944)
Refactoring
- (@mepatrick73) Scatter kernel from cpa to cubecl (#2169)
- (@nathanielsimard): Refactor binary op (#2085)
- (@omahs): Fix typos (#2098)
- (@nathanielsimard): Refactor/jit/unary (#1965)
- (@skewballfox): Separating ONNX parsing from burn-import (#1921)
- (@laggui) Refactor tensor data (#1916)
- (@ArthurBrussee): Remove GraphicsAPI generic for WgpuRuntime (#1888)
- (@skewballfox): add dependency management for python (#1887)
- (@louisfd): refactor reduce into separate traits (#1798)
- (@nathanielsimard): Refactor/jit fusion (#1750)
- (@nathanielsimard): Refactor/burn compute (#1580)
Documentation & Examples
- (@nathanielsimard) Enable cuda-jit in burn-core + in text classification example (#2160)
- (@cBournhonesque): Add comments for matmul kernel (#2138)
- (@laggui) Fix inner backend typo in book guide (#2135)
- (@antimora): Improve ONNX import book section (#2059)
- (@antimora): Update slice documentation (#2024)
- (@syl20bnr): Remove mention of example in backend section of the book (#2014)
- (@laggui) Fix image-classsification-web + autotune flag usage (#2011)
- (@nathanielsimard): Cube/doc/readme (#1904)
- (@laggui, @syl20bnr) Add models and examples reference (#1966)
- (@antimora): Print module part3 - Update book (#1940)
- (@towerpark): Book: Fix the link to burn-train in "Learner" page (#1920)
- (@nathanielsimard): Doc: Improve module to_device/fork docs (#1901)
- (@jwric, @ThierryCantin-Demers, @mepatrick73): Add documentation to burn core nn (#1746)
- (@towerpark): Book: Fix typos in the name of MessagePack format (#1868)
- (@Zirconium409122, @kantic): Remainder operator doc (#1836)
- (@nathanielsimard): Fix wasm examples (#1824)
- (@eltociear) docs: update README.md (#1810)
- (@agelas): Contributor Book: Onnx to Burn Conversion (#1771)
- (@benbaarber): update ARCHITECTURE.md links to project architecture section in contributor book (#1759)
- (@jwric): Add hidden code snippets to guide example in Burn book [redo] (#1742)
- (@mepatrick73): Fixing various syntax errors in the Burn book (#1740)
- (@ThierryCantin-Demers) Add indentation to project architecture in contributing book (#1738)
- (@AntBlo) Add info about enabling debugging for new contributors (#1719)
- (@syl20bnr): [guide] Remove ambiguity lib vs. executable (#1649)
- (@wangxiaochuTHU): Update README.md (#1696)
- (@syl20bnr): [burn-book] Fix broken URL to SUPPORTED-ONNX-OPS.md (#1651)
- (@syl20bnr): [burn-book] Fix typos in getting started (#1650)
- (@louisfd): Many superficial fixes to the contributor book (#1644)
- (@laggui) Fix guide project name in the book (#1631)
- (@Gadersd): Improve grammar (#1619)
- (@agelas): Docs/update contributor book (#1622)
CubeCL
- (@laggui) Remove CubeCL GELU kernel example reference (moved to CubeCL repo) (#2150)
- (@cBournhonesque) Convert
reduce_dim_naive
kernel to use the#[cube]
derive macro (#2117) - (@syl20bnr): Rename revision key to rev for cubecl dependencies in Cargo.toml (#2086)
- (@syl20bnr): Fix cubecl version in Cargo.toml to correctly fecth the version tag
- (@louisfd): Refactor/jit cube/mask (#2075)
- (@nathanielsimard): Chore/update/cubecl (#2067)
- (@ArthurBrussee): Feat: Dynamic cube count dispatch (#1975)
- (@nathanielsimard): Refactor cube launch + support inplace operation (#1961)
- (@nathanielsimard): Feat/cube/cooperative matrix-multiply and accumulate. (#1943)
- (@nathanielsimard): Refactor/cube/mutability (#1934)
- (@nathanielsimard): Handle visibility in cube (#1929)
- (@nathanielsimard): Feat/cube/array assign ops (#1914)
- (@nathanielsimard): Feat/comptime expr (#1910)
- (@nathanielsimard): Feat/cube/compile error (#1909)
- (@nathanielsimard): feat cube support Array (#1907)
- (@louisfd): Cube: variable reusability + refactor in cube macros (#1885)
- (@nathanielsimard): Refactor the tuner to be used standalone (#1884)
- (@ArthurBrussee): Add option to flush queue instead of waiting for completion. (#1864)
- (@louisfd): Cube: Vectorization + simple matmul implementation (#1866)
- (@ArthurBrussee): Get resources from server (#1861)
- (@ArthurBrussee): Speedup client.create for small allocations. (#1858)
- (@ArthurBrussee): Add a feature to initialize from an existing wgpu adapter/device/queue (#1788)
- (@laggui) Fix cmma test (#1957)
- (@nathanielsimard): Perf/dynamic mm (#1906)
- (@mepatrick73): Feat/dynamic small pool (#1931)
- (@mepatrick73): Perf/dynamic mm slice adressing (#1917)
- (@mepatrick73): Feat/dynamic mm basic implementation + small refactor (#1844)
- (@louisfd): Cube: CubeType (no launch) and Comptime::map (#1853)
- (@louisfd, @nathanielsimard): Feat/cube/struct support (#1842)
- (@nathanielsimard): [Refactor - Breaking] Refactor cube operations with better names & Support subgroup operations (#1839)
- (@louisfd, @nathanielsimard): Cube: Topology constants (#1838)
- (@louisfd): Cube: cleaner use of topology values (#1835)
- (@louisfd): Cube: support for shared memory (#1831)
- (@louisfd): Cube: support method call + prettier tensor metadata (#1829)
- (@nathanielsimard): Add vectorization support into cube (#1830)
- (@louisfd): Cube: support for return + conv2d early return (#1828)
- (@nathanielsimard): Feat/cube/launch (#1827)
- (@nathanielsimard): Update cuda-jit (#1799)
- (@louisfd): Feat/cube/remaining ops (#1807)
- (@louisfd): Cube: first ported kernel + comptime support + variable reuse + cleanup (#1797)
- (@louisfd): Refactor/cube/vectorization (#1781)
- (@louisfd, @nathanielsimard): Feat/enable cube cl (#1777)
- (@nathanielsimard, @louisfd): Feat/cubecl ir (#1776)
- (@louisfd): CubeCL first iteration (#1756)
- (@nathanielsimard): First draft CUDA runtime (#1685)
- (@nathanielsimard): Upgrade wgpu (#1692)
Miscellaneous
- (@BjornTheProgrammer) Make compatible with thumbv6m-none-eabi + add raspberry pi pico example (#2096)
- (@antimora): Precision option for tensor display (#2139)
- (@tiruka): remove lto linker option to make build successful (#2123)
- (@cBournhonesque): Add top-k accuracy (#2097)
- (@tiruka): Modify contributing md scripts to solve conflicts between doc and scripts (#2107)
- (@ragyabraham, @antimora): Add polars DataFrame support for Dataset (#2029)
- (@tiruka): modify broken link src of ide image (#2079)
- (@syl20bnr): Bump rust minimal version to 1.79
- (@Haislich): Added parameter trust_remote_code to hf dataset call. (#2013)
- (@laggui) Enable optimized handling of bytes (#2003)
- (@nathanielsimard): Feat: Support trait with CubeCL (#1980)
- (@syl20bnr): Set DEFAULT_MAX_TASKS to 1 when running tests
- (@loganbnielsen) remove manual option matching (#1948)
- (@jwhogg): Remove closed 'future improvements' (#1935)
- (@nathanielsimard): Fix: launch without generics (#1932)
- (@antimora): Update candle-core to a released version (#1913)
- (@ArthurBrussee): Do not use default burn-compute features unless enabled. (#1908)
- (@louisfd): clippy on rust update (#1886)
- (@Icekey): LearnerBuilder "with_checkpointing_strategy" should use builder pattern (#1841)
- (@nathanielsimard): Fix bench load record benchmarks (#1826)
- (@jwric): Add configurable application logger to learner builder (#1774)
- (@getumen) Add Clone trait to the
OptimizerAdaptor
and Clone implementations to the optimizers (#1770) - (@benbaarber): Replace opaque return types in optim (#1767)
- (@ahmedyarub, @syl20bnr) #1747 Upgrade Rust dependencies (#1748)
- (@sebhtml): Refactor: replace trait TemplateKernel by existing trait JitKernel (#1737)
- (@louisfd): Autodiff Memory Management: BFS (#1710)
- (@nathanielsimard): [Fusion] Support multi-precision fusion (#1718)
- (@laggui) Refactor element type to be decoupled from runtime (#1693)
- (@AlexErrant)
Arc<EventStoreClient>
toRc<EventStoreClient>
(#1668) - (@louisfd): remove JIT subsequent RNG tests (#1652)
- (@antimora): Enable native sign operation for Candle backend (#1647)
Bug Fixes
- (@laggui) Fix module derive with generics (#2127)
- (@tiruka): modified mnist image link in the Hugging face (#2134)
- (@NoahSchiro) Fix broken links in contributor book (#2061)
- (@syl20bnr): Bump gix-tempfile to fix security audit on gix-fs (#2022)
- (@laggui) Fix warnings when using
record-backward-compat
(#1977) - (@nathanielsimard): Fix: constant record loading (#1902)
- (@laggui) Fix
DataSerialize
conversion for elements of the same type (#1832) - (@DieracDelta): Fix burn-jit compile error (#1803)
- (@laggui) Fix record nested value de/serialization (#1751)
- (@louisfd): fix prng bug during autotune (#1791)
- (@ThierryCantin-Demers, @jwric) Fix Cargo.toml repository links (#1749)
- (@AntBlo) Fix unstable tests when run concurrently (#1724)
- (@lancelet) Handle ndarray matmul broadcasting (#1679)
- (@laggui) Fix inverted epoch - iteration counts in valid progress (#1699)
- (@NicoZweifel) fix:
window
->pub window
indataset/mod.rs
(#1658)