Releases: ROCm/MISA
Releases · ROCm/MISA
MISA v2022.03.02
This release is mainly for navi21 NCHWc
kernels.
- support navi21
NCHWc
kernel - support
fp16x8
,fp16x4
,int8x16
,int8x8
,int8x4
,int4x32
,int4x16
,int4x8
- filter layout support both
KCYXc
andCYXKc
- support tile based conv in NCHWc kernel
- change name from
iGEMMgen
toMISA
generator v2021.10.17
- support gfx90a NHWC fp16 alt implementation
- support gfx90a NHWC bf16
generator v2021.07.22
- support support NHWC fp32/fp16 fwd/bwd/wrw xdlops
- support gfx90a NHWC fp32/fp16 fwd/bwd/wrw
- reorganize pythono code structure
generator v2021.05.25
This generator is mainly for a feature named "global memory access pattern", or gmap for short.
As the name suggested, gmap is used to dump the memory access pattern of input/weight/output tensor, for organized for each individual block, and for each individual read/write request.
This feature is controlled by an environment variable IGEMM_DUMP_GMAP
, example to use this feature:
python3 igemm_codegen.py config/igemm_bwd_gtc_gfx908_nhwc_fp16.config
cd out/
IGEMM_DUMP_GMAP=1 ./conv_driver.exe convfp16 -n 2 -c 1024 -H 40 -W 52 -k 512 -y 1 -x 1 -p 0 -q 0 -u 2 -v 2 -l 1 -j 1 -g 1 -t 1 -F 2 --in_layout NHWC --fil_layout NHWC --out_layout NHWC
Currently only support NHWC
fwd/bwd
, fp32/fp16
. More layout precision support is to be added.
generator v2021.05.18
New Features
- support NCHW fp32 fwd/bwd/wrw xdlops
- support NHWC fp32 fwd/bwd xdlops
- support NHWC fp16 fwd/bwd xdlops
generator v0.5.0
New Features
- support fp32 NCHW on gfx908, fwd/bwd/wrw direction.
- support group conv in three direction.
- support generate single kernel based on input config.
- support auto gen based on sequence mode in config file
- support gpu reference kernel as verification backend. speedup a lot compared to cpu verification.
- fwd/bwd use magic number for integer division