[SME] Add scalable fp16->fp32 dense schedule #16981

lhutton1 · 2024-05-08T19:43:16Z

This commit extends the functionality of the SME dense and matmul schedules to support operations with fp16 inputs and an fp32 output, where transpose_a=False and transpose_b=True.

For convenience, it also adds a utility called get_vscale_factor which creates the correct multiplier for vscale given a data type, reflecting ideas from an early design of the SVE RFC.

~~Note: this commit depends on #16921 so also contains the contents of #16921.~~

This commit extends the functionality of the SME dense and matmul schedules to support operations with fp16 inputs and an fp32 output, where `transpose_a=False` and `transpose_b=True`. For convenience, it also adds a utility called `get_vscale_factor` which created the correct multiplier for `vscale` given a data type, reflecting ideas from an early design of the [SVE](apache/tvm-rfcs#104) RFC. Change-Id: I8c00bc6baf2df6015fa41200a238781126c73589

Change-Id: Ie7fb7a0a76119aa5c82e03ea0b2cc10de9f15f5e

Change-Id: I0e9e45b285082b42676e53e74158e11d7e08608b

Change-Id: I32273241ae7569b65e082759e4f2ca4355ac6933

lhutton1 · 2024-05-16T08:49:18Z

cc @ekalda @Anndrey24 @leandron

ekalda

Thanks @lhutton1, really cool stuff! I only have nits.

tests/python/relay/strategy/arm_cpu/test_dense.py

python/tvm/tir/op.py

python/tvm/topi/arm_cpu/dense_alter_op.py

python/tvm/tir/tensor_intrin/arm_cpu.py

Change-Id: I237b4c5cb5ca22e33529d98cbd75177b94904857

lhutton1 · 2024-05-27T15:02:48Z

@tvm-bot rerun

ekalda

Thanks @lhutton1, LGTM!

ekalda · 2024-05-28T14:55:13Z

Thanks @lhutton1 this is merged now!

Fixes a merge conflict between apache#16981 and apache#17003. Change-Id: Ifcc983ef0b8c00250568a048fd682933adfdcde4

Fixes a merge conflict between #16981 and #17003. Change-Id: Ifcc983ef0b8c00250568a048fd682933adfdcde4

This commit extends the SME conv2d NHWC schedule to support convolutions with float16 inputs (data and kernel) and a float32 output using the tensor intrinsics added in apache#16981.

This commit extends the SME conv2d NHWC schedule to support convolutions with float16 inputs (data and kernel) and a float32 output using the tensor intrinsics added in #16981.

lhutton1 mentioned this pull request May 8, 2024

[Tracking Issue] Scalable Matrix Extension (SME) upstreaming #16734

Open

11 tasks

lhutton1 added 2 commits May 15, 2024 10:32

Fix failing asserts

1fe9bac

Change-Id: Ie7fb7a0a76119aa5c82e03ea0b2cc10de9f15f5e

lhutton1 force-pushed the sme-fp16-fp32-dense-schedule branch from d2a164c to 1fe9bac Compare May 15, 2024 12:11

lhutton1 added 2 commits May 15, 2024 12:36

Change ptrue predicate to use boolean values

7d10268

Change-Id: I0e9e45b285082b42676e53e74158e11d7e08608b

Fix topi_matmul test and avoid scalable expression warnings

7363127

Change-Id: I32273241ae7569b65e082759e4f2ca4355ac6933

lhutton1 marked this pull request as ready for review May 16, 2024 07:58

ekalda reviewed May 21, 2024

View reviewed changes

Address comments

bc02e47

Change-Id: I237b4c5cb5ca22e33529d98cbd75177b94904857

lhutton1 force-pushed the sme-fp16-fp32-dense-schedule branch from 0d2be71 to bc02e47 Compare May 22, 2024 16:37

ekalda approved these changes May 28, 2024

View reviewed changes

ekalda merged commit 430e02f into apache:main May 28, 2024
19 checks passed

lhutton1 added a commit to lhutton1/tvm that referenced this pull request May 29, 2024

[TOPI] Fix SME conv2d schedule import and intrin argument

b5860de

Fixes a merge conflict between apache#16981 and apache#17003. Change-Id: Ifcc983ef0b8c00250568a048fd682933adfdcde4

lhutton1 mentioned this pull request May 29, 2024

[TOPI] Fix SME conv2d schedule import and intrin argument #17040

Merged

tqchen pushed a commit that referenced this pull request May 29, 2024

[TOPI] Fix SME conv2d schedule import and intrin argument (#17040)

8bdd54b

Fixes a merge conflict between #16981 and #17003. Change-Id: Ifcc983ef0b8c00250568a048fd682933adfdcde4

Anndrey24 mentioned this pull request May 30, 2024

[SME][TOPI] Add conv2d NHWC SME fp16->fp32 schedule #17048

Merged

lhutton1 deleted the sme-fp16-fp32-dense-schedule branch June 6, 2024 10:41

ysh329 mentioned this pull request Jul 20, 2024

[Release] v0.17.0 Release Candidate Notes #17178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SME] Add scalable fp16->fp32 dense schedule #16981

[SME] Add scalable fp16->fp32 dense schedule #16981

lhutton1 commented May 8, 2024 •

edited

Loading

lhutton1 commented May 16, 2024 •

edited

Loading

ekalda left a comment

lhutton1 commented May 27, 2024

ekalda left a comment

ekalda commented May 28, 2024

[SME] Add scalable fp16->fp32 dense schedule #16981

[SME] Add scalable fp16->fp32 dense schedule #16981

Conversation

lhutton1 commented May 8, 2024 • edited Loading

lhutton1 commented May 16, 2024 • edited Loading

ekalda left a comment

Choose a reason for hiding this comment

lhutton1 commented May 27, 2024

ekalda left a comment

Choose a reason for hiding this comment

ekalda commented May 28, 2024

lhutton1 commented May 8, 2024 •

edited

Loading

lhutton1 commented May 16, 2024 •

edited

Loading