Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SwiGLU further optimization in MLP bw #502

Merged
merged 9 commits into from
Nov 10, 2022

Conversation

danthe3rd
Copy link
Contributor

@danthe3rd danthe3rd commented Nov 2, 2022

Stack from ghstack (oldest at bottom):

*PERFORMANCE A100

operandfused_all <- THIS PR
SwiGLUPackedFusedOp <- previous pr
[--------------------------------------- swiglu_bw ---------------------------------------]
                                     |  operandfused_all  |  eager   |  SwiGLUPackedFusedOp
1 threads: --------------------------------------------------------------------------------
      b16    B=9456, I=1536, H=4096  |       2227.6       |  2708.3  |         2341.6      
      f16    B=9456, I=1536, H=4096  |       2337.5       |  2705.8  |         2339.1      
      f16.ac B=9456, I=1536, H=4096  |       2630.5       |  2998.5  |         2806.6      
      b16    B=4440, I=1536, H=4096  |       1177.9       |  1424.5  |         1246.4      
      f16    B=4440, I=1536, H=4096  |       1205.1       |  1418.8  |         1240.6      
      f16.ac B=4440, I=1536, H=4096  |       1409.0       |  1637.4  |         1541.7      
      b16    B=4728, I=1536, H=4096  |       1238.6       |  1493.5  |         1397.5      
      f16    B=4728, I=1536, H=4096  |       1274.8       |  1488.2  |         1392.7      
      f16.ac B=4728, I=1536, H=4096  |       1478.2       |  1710.3  |         1512.9      
      b16    B=4728, I=1536, H=1024  |        461.0       |   518.7  |          487.7      
      f16    B=4728, I=1536, H=1024  |        438.2       |   498.3  |          479.8      
      f16.ac B=4728, I=1536, H=1024  |        560.9       |   623.2  |          601.4      

Times are in microseconds (us).

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 2, 2022
@danthe3rd danthe3rd mentioned this pull request Nov 2, 2022
danthe3rd pushed a commit that referenced this pull request Nov 2, 2022
ghstack-source-id: 12b56c35da4e6fb208bdc51ee146601dc5c35517
Pull Request resolved: #502
@danthe3rd danthe3rd requested a review from fmassa November 2, 2022 15:20
danthe3rd added 3 commits November 3, 2022 08:21
***PERFORMANCE A100**

```
operandfused_all <- THIS PR
SwiGLUPackedFusedOp <- previous pr
[--------------------------------------- swiglu_bw ---------------------------------------]
                                     |  operandfused_all  |  eager   |  SwiGLUPackedFusedOp
1 threads: --------------------------------------------------------------------------------
      b16    B=9456, I=1536, H=4096  |       2227.6       |  2708.3  |         2341.6      
      f16    B=9456, I=1536, H=4096  |       2337.5       |  2705.8  |         2339.1      
      f16.ac B=9456, I=1536, H=4096  |       2630.5       |  2998.5  |         2806.6      
      b16    B=4440, I=1536, H=4096  |       1177.9       |  1424.5  |         1246.4      
      f16    B=4440, I=1536, H=4096  |       1205.1       |  1418.8  |         1240.6      
      f16.ac B=4440, I=1536, H=4096  |       1409.0       |  1637.4  |         1541.7      
      b16    B=4728, I=1536, H=4096  |       1238.6       |  1493.5  |         1397.5      
      f16    B=4728, I=1536, H=4096  |       1274.8       |  1488.2  |         1392.7      
      f16.ac B=4728, I=1536, H=4096  |       1478.2       |  1710.3  |         1512.9      
      b16    B=4728, I=1536, H=1024  |        461.0       |   518.7  |          487.7      
      f16    B=4728, I=1536, H=1024  |        438.2       |   498.3  |          479.8      
      f16.ac B=4728, I=1536, H=1024  |        560.9       |   623.2  |          601.4      

Times are in microseconds (us).
```

[ghstack-poisoned]
***PERFORMANCE A100**

```
operandfused_all <- THIS PR
SwiGLUPackedFusedOp <- previous pr
[--------------------------------------- swiglu_bw ---------------------------------------]
                                     |  operandfused_all  |  eager   |  SwiGLUPackedFusedOp
1 threads: --------------------------------------------------------------------------------
      b16    B=9456, I=1536, H=4096  |       2227.6       |  2708.3  |         2341.6      
      f16    B=9456, I=1536, H=4096  |       2337.5       |  2705.8  |         2339.1      
      f16.ac B=9456, I=1536, H=4096  |       2630.5       |  2998.5  |         2806.6      
      b16    B=4440, I=1536, H=4096  |       1177.9       |  1424.5  |         1246.4      
      f16    B=4440, I=1536, H=4096  |       1205.1       |  1418.8  |         1240.6      
      f16.ac B=4440, I=1536, H=4096  |       1409.0       |  1637.4  |         1541.7      
      b16    B=4728, I=1536, H=4096  |       1238.6       |  1493.5  |         1397.5      
      f16    B=4728, I=1536, H=4096  |       1274.8       |  1488.2  |         1392.7      
      f16.ac B=4728, I=1536, H=4096  |       1478.2       |  1710.3  |         1512.9      
      b16    B=4728, I=1536, H=1024  |        461.0       |   518.7  |          487.7      
      f16    B=4728, I=1536, H=1024  |        438.2       |   498.3  |          479.8      
      f16.ac B=4728, I=1536, H=1024  |        560.9       |   623.2  |          601.4      

Times are in microseconds (us).
```

[ghstack-poisoned]
***PERFORMANCE A100**

```
operandfused_all <- THIS PR
SwiGLUPackedFusedOp <- previous pr
[--------------------------------------- swiglu_bw ---------------------------------------]
                                     |  operandfused_all  |  eager   |  SwiGLUPackedFusedOp
1 threads: --------------------------------------------------------------------------------
      b16    B=9456, I=1536, H=4096  |       2227.6       |  2708.3  |         2341.6      
      f16    B=9456, I=1536, H=4096  |       2337.5       |  2705.8  |         2339.1      
      f16.ac B=9456, I=1536, H=4096  |       2630.5       |  2998.5  |         2806.6      
      b16    B=4440, I=1536, H=4096  |       1177.9       |  1424.5  |         1246.4      
      f16    B=4440, I=1536, H=4096  |       1205.1       |  1418.8  |         1240.6      
      f16.ac B=4440, I=1536, H=4096  |       1409.0       |  1637.4  |         1541.7      
      b16    B=4728, I=1536, H=4096  |       1238.6       |  1493.5  |         1397.5      
      f16    B=4728, I=1536, H=4096  |       1274.8       |  1488.2  |         1392.7      
      f16.ac B=4728, I=1536, H=4096  |       1478.2       |  1710.3  |         1512.9      
      b16    B=4728, I=1536, H=1024  |        461.0       |   518.7  |          487.7      
      f16    B=4728, I=1536, H=1024  |        438.2       |   498.3  |          479.8      
      f16.ac B=4728, I=1536, H=1024  |        560.9       |   623.2  |          601.4      

Times are in microseconds (us).
```

[ghstack-poisoned]
***PERFORMANCE A100**

```
operandfused_all <- THIS PR
SwiGLUPackedFusedOp <- previous pr
[--------------------------------------- swiglu_bw ---------------------------------------]
                                     |  operandfused_all  |  eager   |  SwiGLUPackedFusedOp
1 threads: --------------------------------------------------------------------------------
      b16    B=9456, I=1536, H=4096  |       2227.6       |  2708.3  |         2341.6      
      f16    B=9456, I=1536, H=4096  |       2337.5       |  2705.8  |         2339.1      
      f16.ac B=9456, I=1536, H=4096  |       2630.5       |  2998.5  |         2806.6      
      b16    B=4440, I=1536, H=4096  |       1177.9       |  1424.5  |         1246.4      
      f16    B=4440, I=1536, H=4096  |       1205.1       |  1418.8  |         1240.6      
      f16.ac B=4440, I=1536, H=4096  |       1409.0       |  1637.4  |         1541.7      
      b16    B=4728, I=1536, H=4096  |       1238.6       |  1493.5  |         1397.5      
      f16    B=4728, I=1536, H=4096  |       1274.8       |  1488.2  |         1392.7      
      f16.ac B=4728, I=1536, H=4096  |       1478.2       |  1710.3  |         1512.9      
      b16    B=4728, I=1536, H=1024  |        461.0       |   518.7  |          487.7      
      f16    B=4728, I=1536, H=1024  |        438.2       |   498.3  |          479.8      
      f16.ac B=4728, I=1536, H=1024  |        560.9       |   623.2  |          601.4      

Times are in microseconds (us).
```

[ghstack-poisoned]
@danthe3rd danthe3rd mentioned this pull request Nov 4, 2022
danthe3rd added 2 commits November 4, 2022 10:04
***PERFORMANCE A100**

```
operandfused_all <- THIS PR
SwiGLUPackedFusedOp <- previous pr
[--------------------------------------- swiglu_bw ---------------------------------------]
                                     |  operandfused_all  |  eager   |  SwiGLUPackedFusedOp
1 threads: --------------------------------------------------------------------------------
      b16    B=9456, I=1536, H=4096  |       2227.6       |  2708.3  |         2341.6      
      f16    B=9456, I=1536, H=4096  |       2337.5       |  2705.8  |         2339.1      
      f16.ac B=9456, I=1536, H=4096  |       2630.5       |  2998.5  |         2806.6      
      b16    B=4440, I=1536, H=4096  |       1177.9       |  1424.5  |         1246.4      
      f16    B=4440, I=1536, H=4096  |       1205.1       |  1418.8  |         1240.6      
      f16.ac B=4440, I=1536, H=4096  |       1409.0       |  1637.4  |         1541.7      
      b16    B=4728, I=1536, H=4096  |       1238.6       |  1493.5  |         1397.5      
      f16    B=4728, I=1536, H=4096  |       1274.8       |  1488.2  |         1392.7      
      f16.ac B=4728, I=1536, H=4096  |       1478.2       |  1710.3  |         1512.9      
      b16    B=4728, I=1536, H=1024  |        461.0       |   518.7  |          487.7      
      f16    B=4728, I=1536, H=1024  |        438.2       |   498.3  |          479.8      
      f16.ac B=4728, I=1536, H=1024  |        560.9       |   623.2  |          601.4      

Times are in microseconds (us).
```

[ghstack-poisoned]
***PERFORMANCE A100**

```
operandfused_all <- THIS PR
SwiGLUPackedFusedOp <- previous pr
[--------------------------------------- swiglu_bw ---------------------------------------]
                                     |  operandfused_all  |  eager   |  SwiGLUPackedFusedOp
1 threads: --------------------------------------------------------------------------------
      b16    B=9456, I=1536, H=4096  |       2227.6       |  2708.3  |         2341.6      
      f16    B=9456, I=1536, H=4096  |       2337.5       |  2705.8  |         2339.1      
      f16.ac B=9456, I=1536, H=4096  |       2630.5       |  2998.5  |         2806.6      
      b16    B=4440, I=1536, H=4096  |       1177.9       |  1424.5  |         1246.4      
      f16    B=4440, I=1536, H=4096  |       1205.1       |  1418.8  |         1240.6      
      f16.ac B=4440, I=1536, H=4096  |       1409.0       |  1637.4  |         1541.7      
      b16    B=4728, I=1536, H=4096  |       1238.6       |  1493.5  |         1397.5      
      f16    B=4728, I=1536, H=4096  |       1274.8       |  1488.2  |         1392.7      
      f16.ac B=4728, I=1536, H=4096  |       1478.2       |  1710.3  |         1512.9      
      b16    B=4728, I=1536, H=1024  |        461.0       |   518.7  |          487.7      
      f16    B=4728, I=1536, H=1024  |        438.2       |   498.3  |          479.8      
      f16.ac B=4728, I=1536, H=1024  |        560.9       |   623.2  |          601.4      

Times are in microseconds (us).
```

[ghstack-poisoned]
@codecov-commenter
Copy link

codecov-commenter commented Nov 4, 2022

Codecov Report

Base: 88.37% // Head: 88.37% // No change to project coverage 👍

Coverage data is based on head (4a444b0) compared to base (4a444b0).
Patch has no changes to coverable lines.

❗ Current head 4a444b0 differs from pull request most recent head 1227f05. Consider uploading reports for the commit 1227f05 to get more accurate results

Additional details and impacted files
@@                  Coverage Diff                  @@
##           gh/danthe3rd/57/base     #502   +/-   ##
=====================================================
  Coverage                 88.37%   88.37%           
=====================================================
  Files                        80       80           
  Lines                      4798     4798           
=====================================================
  Hits                       4240     4240           
  Misses                      558      558           
Flag Coverage Δ
Python 88.37% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

***PERFORMANCE A100**

```
operandfused_all <- THIS PR
SwiGLUPackedFusedOp <- previous pr
[--------------------------------------- swiglu_bw ---------------------------------------]
                                     |  operandfused_all  |  eager   |  SwiGLUPackedFusedOp
1 threads: --------------------------------------------------------------------------------
      b16    B=9456, I=1536, H=4096  |       2227.6       |  2708.3  |         2341.6      
      f16    B=9456, I=1536, H=4096  |       2337.5       |  2705.8  |         2339.1      
      f16.ac B=9456, I=1536, H=4096  |       2630.5       |  2998.5  |         2806.6      
      b16    B=4440, I=1536, H=4096  |       1177.9       |  1424.5  |         1246.4      
      f16    B=4440, I=1536, H=4096  |       1205.1       |  1418.8  |         1240.6      
      f16.ac B=4440, I=1536, H=4096  |       1409.0       |  1637.4  |         1541.7      
      b16    B=4728, I=1536, H=4096  |       1238.6       |  1493.5  |         1397.5      
      f16    B=4728, I=1536, H=4096  |       1274.8       |  1488.2  |         1392.7      
      f16.ac B=4728, I=1536, H=4096  |       1478.2       |  1710.3  |         1512.9      
      b16    B=4728, I=1536, H=1024  |        461.0       |   518.7  |          487.7      
      f16    B=4728, I=1536, H=1024  |        438.2       |   498.3  |          479.8      
      f16.ac B=4728, I=1536, H=1024  |        560.9       |   623.2  |          601.4      

Times are in microseconds (us).
```

[ghstack-poisoned]
Copy link
Contributor

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

.gitmodules Outdated Show resolved Hide resolved
***PERFORMANCE A100**

```
operandfused_all <- THIS PR
SwiGLUPackedFusedOp <- previous pr
[--------------------------------------- swiglu_bw ---------------------------------------]
                                     |  operandfused_all  |  eager   |  SwiGLUPackedFusedOp
1 threads: --------------------------------------------------------------------------------
      b16    B=9456, I=1536, H=4096  |       2227.6       |  2708.3  |         2341.6      
      f16    B=9456, I=1536, H=4096  |       2337.5       |  2705.8  |         2339.1      
      f16.ac B=9456, I=1536, H=4096  |       2630.5       |  2998.5  |         2806.6      
      b16    B=4440, I=1536, H=4096  |       1177.9       |  1424.5  |         1246.4      
      f16    B=4440, I=1536, H=4096  |       1205.1       |  1418.8  |         1240.6      
      f16.ac B=4440, I=1536, H=4096  |       1409.0       |  1637.4  |         1541.7      
      b16    B=4728, I=1536, H=4096  |       1238.6       |  1493.5  |         1397.5      
      f16    B=4728, I=1536, H=4096  |       1274.8       |  1488.2  |         1392.7      
      f16.ac B=4728, I=1536, H=4096  |       1478.2       |  1710.3  |         1512.9      
      b16    B=4728, I=1536, H=1024  |        461.0       |   518.7  |          487.7      
      f16    B=4728, I=1536, H=1024  |        438.2       |   498.3  |          479.8      
      f16.ac B=4728, I=1536, H=1024  |        560.9       |   623.2  |          601.4      

Times are in microseconds (us).
```

[ghstack-poisoned]
@danthe3rd danthe3rd merged commit 1227f05 into gh/danthe3rd/57/base Nov 10, 2022
danthe3rd pushed a commit that referenced this pull request Nov 10, 2022
ghstack-source-id: b1bead3ade9c469dc859d3576b8834488845e57c
Pull Request resolved: #502
@danthe3rd danthe3rd deleted the gh/danthe3rd/57/head branch November 10, 2022 18:11
bertmaher pushed a commit to bertmaher/xformers that referenced this pull request Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants