Feat/layernorm #36

pommedeterresautee · 2022-09-12T13:54:02Z

a single pass layernorm implementation based on welford formula

https://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/

…w benchmarks

… fix/refactoring_benchmarks

# Conflicts: # README.md # implementations/activation_func.py # implementations/attention_masked_original.py # implementations/layer_norm.py # implementations/linear_layer.py # optimizer/dynamo_backend.py # optimizer/layer_norm.py # optimizer/linear.py # test/models/bert.py # test/test_attention.py # test/test_batched_matmul.py # test/test_layer_norm.py # test/test_linear_layer.py # test/test_torchdynamo_bert.py

# Conflicts: # implementations/layer_norm.py # test/test_layer_norm.py # test/test_linear_layer.py # test/test_torchdynamo_bert.py

gaetansnl · 2022-09-14T15:23:53Z

implementations/layer_norm.py


 # CREDITS: Initially inspired by the Triton tutorial


 @triton.jit
-def _layer_norm_fwd_fused(
+def _layer_norm_fwd_fused_single_pass(


I think it misses documentaiton and naming: What is A ? What size ? stride of what and what dimension ?

gaetansnl · 2022-09-14T15:25:44Z

implementations/layer_norm.py

+    https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
+    """
+    # position of elements processed by this program
+    row = tl.program_id(0)


we have naming convention for this (_id or _idx)

from the original implementation

gaetansnl · 2022-09-14T15:26:22Z

implementations/layer_norm.py

+    """
+    # position of elements processed by this program
+    row = tl.program_id(0)
+    Out += row * stride


this is a very bad practice IMO

from original implementation

gaetansnl · 2022-09-14T15:27:40Z

implementations/layer_norm.py

+    # compute mean
+    mean = 0.0
+    var = 0.0
+    for start in range(0, N, BLOCK_SIZE):


IMO start and end should have more explicit name (like in attention kernel)

gaetansnl · 2022-09-14T15:28:35Z

implementations/layer_norm.py

+    for start in range(0, N, BLOCK_SIZE):
+        end = min((start + BLOCK_SIZE), N)
+        nb_block_col = end - start
+        cols = start + tl.arange(0, BLOCK_SIZE)


we call this offsets in other kernels

gaetansnl · 2022-09-14T15:29:03Z

implementations/layer_norm.py

+        nb_block_col = end - start
+        cols = start + tl.arange(0, BLOCK_SIZE)
+        mask = cols < N
+        a = tl.load(A + cols, mask=mask, other=0., eviction_policy="evict_last").to(tl.float32)


could you document why eviction_policy="evict_last"

gaetansnl · 2022-09-14T15:37:47Z

implementations/layer_norm.py

+    rstd = 1 / tl.sqrt(var + eps)
+
+    # write-back mean/rstd
+    tl.store(Mean + row, mean)


could you add why we do this (futur backward)

gaetansnl · 2022-09-14T15:42:03Z

implementations/layer_norm.py

+        var += block_delta + delta_mean_sqr * (start * nb_block_col) / end
+
+    var = var / N
+    rstd = 1 / tl.sqrt(var + eps)


what rstd means ? root std ?

gaetansnl and others added 30 commits August 11, 2022 12:19

feat: add attention

a45997c

fix: use tuple in triton

0745dd2

docs: attention

d06fb8a

feat: add torchdynamo end to end fusion

2c4f4bf

feat: causal masked attention

0644ae6

feat: benchmark dynamo backends

e6e359b

Merge branch 'main' into feat/torchdynamo-fused

2ba239c

fix: renaming

189ca47

feat: add support for arbitrary stride

0e13b3c

fix: move output outside kernel

a0f880d

feat: module replacement example

a6dcc70

fix: missing benchmark for masked attention

b142c1b

feat: add pattern and fix fx bug

6aceec7

fix: refactoring

a4f920f

feat: add layer_norm

5157a8b

fix: show speedup in benchmark display

90f989c

fix: update torchdyname and matcher

f7a0ffa

fix: update matcher

bf5ecfa

fix: cuda graph

ce73e78

fix: small seq_length

9590e78

feat: viz server

3534cb5

fix: bug in matcher and add complete graph report

26e5720

fix: compatibility with pytorch stable

1576ef5

fix: add credit, rename variables, add doc

f5729df

fix: add test for shape change

be04b88

fix: attention renaming

7e2cb62

fix: add license

672ea04

feat: add stride management on linear layer + replace cuda graph + ne…

f5c1ae2

…w benchmarks

feat: remove M, N masking

c8bb059

feat: improve autotune

3998bc9

pommedeterresautee and others added 7 commits September 9, 2022 15:18

Merge branch 'feat/tools' into fix/refactoring_benchmarks

4bffeef

fix: avoid OOM on reference implementation

8f91f79

fix: get input

61cbc1f

Merge remote-tracking branch 'origin/fix/refactoring_benchmarks' into…

f39219c

… fix/refactoring_benchmarks

fix: remove some OOM test for reference implementation

10703a9

fix: add tests

dabf536

feat: new layernorm single pass variance computation implementation

bd83caa

pommedeterresautee added benchmark Measure, measure, measure optimization labels Sep 12, 2022

pommedeterresautee self-assigned this Sep 12, 2022

pommedeterresautee changed the base branch from main to fix/refactoring_benchmarks September 12, 2022 13:54

pommedeterresautee added 4 commits September 13, 2022 11:59

fix: rename variables

f3564b0

Merge branch 'fix/refactoring_benchmarks' into feat/layernorm

7b06621

feat: add naive implem of layernorm

c2b8ade

Base automatically changed from fix/refactoring_benchmarks to main September 13, 2022 15:44

Merge branch 'main' into feat/layernorm

cca52e8

# Conflicts: # implementations/layer_norm.py # test/test_layer_norm.py # test/test_linear_layer.py # test/test_torchdynamo_bert.py

pommedeterresautee marked this pull request as ready for review September 13, 2022 15:48

fix: store mean/var in layernorm (for bw pass)

1100ef6

pommedeterresautee requested a review from gaetansnl September 13, 2022 16:18

gaetansnl requested changes Sep 14, 2022

View reviewed changes

fix: following review comments

2cedbfa

pommedeterresautee requested a review from gaetansnl September 14, 2022 18:32

pommedeterresautee modified the milestone: release v0.1.0 Sep 14, 2022

fix: add manual seed

7f3aefa

gaetansnl approved these changes Sep 15, 2022

View reviewed changes

pommedeterresautee merged commit 1ed7a03 into main Sep 15, 2022

pommedeterresautee deleted the feat/layernorm branch September 15, 2022 07:52

christallire mentioned this pull request Mar 2, 2023

bug: tests failing at nvidia-driver-530 #304

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/layernorm #36

Feat/layernorm #36

pommedeterresautee commented Sep 12, 2022 •

edited

Loading

gaetansnl Sep 14, 2022

pommedeterresautee Sep 14, 2022

gaetansnl Sep 14, 2022

pommedeterresautee Sep 14, 2022 •

edited

Loading

pommedeterresautee Sep 14, 2022

gaetansnl Sep 14, 2022

pommedeterresautee Sep 14, 2022

pommedeterresautee Sep 14, 2022

gaetansnl Sep 14, 2022

pommedeterresautee Sep 14, 2022

gaetansnl Sep 14, 2022

pommedeterresautee Sep 14, 2022

gaetansnl Sep 14, 2022

pommedeterresautee Sep 14, 2022

gaetansnl Sep 14, 2022

pommedeterresautee Sep 14, 2022

gaetansnl Sep 14, 2022

pommedeterresautee Sep 14, 2022

Feat/layernorm #36

Feat/layernorm #36

Conversation

pommedeterresautee commented Sep 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pommedeterresautee Sep 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pommedeterresautee commented Sep 12, 2022 •

edited

Loading

pommedeterresautee Sep 14, 2022 •

edited

Loading