fix: deprecate flexible mlp heads #160

rheasukthanker · 2024-11-04T19:34:07Z

Reference Issues/PRs

Resolves #146

What does this implement/fix? Explain your changes.

Minimal Example / How should this PR be tested?

Any other comments?

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

aaronkl

Looks good, just some nitpicks

aaronkl · 2024-11-05T09:55:15Z

pyproject.toml

@@ -19,6 +19,7 @@ dependencies = [
  "litgpt[all]==0.5.0",
  "syne-tune[moo]>=0.13",
  "torchvision>=0.18",
+  "tokenizers==0.20.0",


Is this relevant for this PR? If not I would drop it to keep the changelogs clean

Yes, an unrelated test case fails it this is not specified, hence I would keep it

ok makes sense

aaronkl · 2024-11-05T09:55:46Z

examples/profile/profile_flops.py

-    print(
-        f"Mini model {compute_flops(model=model, metric='macs')} macs"
-    )
+    print(f"Mini model {compute_flops(model=model, metric='flops')} flops")


Also here if this is not relevant for this PR, I would drop it

Ruff complains about the way this file is formatted, I am not sure how the checks missed this.

aaronkl · 2024-11-05T10:01:08Z

whittle/models/gpt/model.py

-        sub_network_intermediate_size: list,
-        sub_network_num_heads: list,
+        sub_network_intermediate_size: int,
+        sub_network_num_heads: int,
        sub_network_n_layers: int,
        sub_network_query_groups=None,


can we use type hints here

gabikadlecova

Overall, looks good to me.

I would suggest adding support for getting subnet head_size and n_query_groups from the supernet fields - this makes extraction very easy to do.

gabikadlecova · 2024-11-05T10:47:13Z

whittle/models/gpt/model.py

@@ -184,12 +184,29 @@ def set_sub_network(
        self.sub_network_n_layers = sub_network_n_layers
        self.transformer.wte.set_sub_network(self.sub_network_n_embd)
        self.transformer.ln_f.set_sub_network(self.sub_network_n_embd)
+        if sub_network_query_groups is None:
+            if self.config.n_query_groups == 1:
+                sub_network_query_groups = 1


Can we also set this as a field?
self.sub_network_query_groups = sub_network_query_groups

Then, we can easily get the value for the extract function without duplicating the query_group and head_size computation:

supernet = GPT(config) supernet.set_sub_network(**subnet_args) # sub_network_query_groups == None # ... sub_network_query_groups gets computed here subnet_config = ... # same as subnet_args subnet_config.n_query_groups = supernet.sub_network_query_groups subnet_config.head_size = supernet.sub_network_head_size subnet_correct_sizes = extract_sub_network(supernet, subnet_config)

@gabikadlecova I dont clearly understand the changes needed to extract function+the changes in tests needed. Could you perhaps create a separate PR for that after this PR is merged, or push to this branch directly?

Both sub_network_query_groups and sub_network_head_size are supernet fields now

@rheasukthanker I can push to this branch, it's a small change

gabikadlecova · 2024-11-05T10:53:57Z

whittle/models/gpt/blocks/causal_self_attention.py

+        self.sub_network_query_groups = sub_network_query_groups
+        self.sub_network_head_size = sub_network_head_size


Since we set it like this without checking for None, these should become positional args.

The way this is initialized currently, they can never be none

Sorry, I meant to tag lines 48-49

sub_network_query_groups=None, sub_network_head_size=None,

Since now they cannot be None, we might want to change it to

sub_network_query_groups: int, sub_network_head_size: int,

I think they should be either None or int no? Since they are None by default?

sub_network_query_groups: int | None, sub_network_head_size: int | None,

@aaronkl yes, but I think they should not be allowed to be None anymore - the computation is done in model.py and hence None is not a valid value here anymore

gabikadlecova · 2024-11-05T11:01:21Z

test/test_extract.py

We may want to add a test case where the default head_size and/or n_query_groups of the subnet become different, and we need to compute it/copy it from supernet.sub_net_head_size (see my comment here on whittle/models/gpt/model.py)

@rheasukthanker since you set it as supernet fields already, I only changed the test case to use it

…ttle-org/whittle into deprecate-flexible-mlp-heads

gabikadlecova

The subnet head size and query groups are fields now - the PR is ready to be merged.

I'd still change the subnet head_size/query groups to positional arguments and remove None from type hints (they cannot be None now and should not be None). But it's minor

gabikadlecova · 2024-11-06T08:52:28Z

whittle/models/gpt/model.py

+        self.sub_network_head_size: int | None = self.config.head_size
+        self.sub_network_query_groups: int | None = self.config.n_query_groups


Should it really be None now? It has to be present in the config and when setting the subnetwork, it will never be None

the problem is that our Linter complains since it does some automated type checking later. Maybe there is a more elegant solution?

gabikadlecova · 2024-11-06T08:53:13Z

whittle/models/gpt/model.py

+        self.sub_network_head_size: int | None = self.config.head_size
+        self.sub_network_query_groups: int | None = self.config.n_query_groups


Same here, can it not be None?

gabikadlecova

Subnetwork head size and n_query_groups are fields now

…ct test

aaronkl · 2024-11-06T10:48:08Z

whittle/models/gpt/blocks/causal_self_attention.py

+        self.sub_network_query_groups = sub_network_query_groups
+        self.sub_network_head_size = sub_network_head_size


I think they should be either None or int no? Since they are None by default?

sub_network_query_groups: int | None, sub_network_head_size: int | None,

aaronkl · 2024-11-06T11:00:05Z

whittle/models/gpt/model.py

+        self.sub_network_head_size: int | None = self.config.head_size
+        self.sub_network_query_groups: int | None = self.config.n_query_groups


the problem is that our Linter complains since it does some automated type checking later. Maybe there is a more elegant solution?

rs1131@uni-freiburg.de and others added 9 commits November 4, 2024 17:33

fix: remove flexible intermediate dim and heads

a9cf979

update tests

63d33d9

refactor

aeab23e

fix tests

62f89f4

fix formatting

a47fa7d

fix query groups

6ce86ad

fix causal attention

cb6aa12

debug

3f2630a

fix tests

a0f8e94

rheasukthanker changed the title ~~Deprecate flexible mlp heads~~ fix: deprecate flexible mlp heads Nov 4, 2024

rheasukthanker requested review from aaronkl and gabikadlecova November 4, 2024 19:34

rheasukthanker marked this pull request as draft November 4, 2024 19:41

rheasukthanker added 4 commits November 4, 2024 20:53

Update pyproject.toml

f16710d

Update pyproject.toml

87dea77

Update pyproject.toml

9d359fa

Update pyproject.toml

a8d82d5

rheasukthanker marked this pull request as ready for review November 4, 2024 20:29

Update pyproject.toml

1ec716b

rheasukthanker mentioned this pull request Nov 5, 2024

Adapt code for latest tokenizers version #161

Closed

aaronkl reviewed Nov 5, 2024

View reviewed changes

rheasukthanker and others added 9 commits November 5, 2024 11:25

Update profile_flops.py

a42d7a4

Update model.py

961f4d8

fix formatting

2e546c3

Update model.py

399ada9

Update model.py

8f6edde

Update model.py

9ec32ec

Update model.py

7684947

fix formatting

e247f5a

Update model.py

1c9694b

rheasukthanker and others added 2 commits November 5, 2024 11:55

Update model.py

eff9963

fix formatting

e9013f8

gabikadlecova requested changes Nov 5, 2024

View reviewed changes

RHEA SUKTHANKER and others added 5 commits November 5, 2024 12:08

fix formatting

ce09296

type hinting

fd67ab9

Merge branch 'deprecate-flexible-mlp-heads' of https://github.com/whi…

3ffd428

…ttle-org/whittle into deprecate-flexible-mlp-heads

update supernet attributes

1a7630c

set subnet head size properly

bc71f03

rheasukthanker requested review from gabikadlecova and aaronkl November 6, 2024 05:49

gabikadlecova reviewed Nov 6, 2024

View reviewed changes

gabikadlecova self-requested a review November 6, 2024 08:56

gabikadlecova reviewed Nov 6, 2024

View reviewed changes

gabikadlecova approved these changes Nov 6, 2024

View reviewed changes

Add copying of computed fields (head size, query groups) to the extra…

f21bb34

…ct test

aaronkl approved these changes Nov 6, 2024

View reviewed changes

rheasukthanker merged commit a07ee5a into main Nov 6, 2024
7 checks passed

rheasukthanker deleted the deprecate-flexible-mlp-heads branch November 6, 2024 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: deprecate flexible mlp heads #160

fix: deprecate flexible mlp heads #160

rheasukthanker commented Nov 4, 2024

aaronkl left a comment

aaronkl Nov 5, 2024

rheasukthanker Nov 5, 2024

aaronkl Nov 5, 2024

aaronkl Nov 5, 2024

rheasukthanker Nov 5, 2024

aaronkl Nov 5, 2024

gabikadlecova left a comment

gabikadlecova Nov 5, 2024

rheasukthanker Nov 6, 2024 •

edited

Loading

rheasukthanker Nov 6, 2024

gabikadlecova Nov 6, 2024

gabikadlecova Nov 5, 2024

rheasukthanker Nov 6, 2024

gabikadlecova Nov 6, 2024

aaronkl Nov 6, 2024

gabikadlecova Nov 6, 2024

gabikadlecova Nov 5, 2024

gabikadlecova Nov 6, 2024

gabikadlecova left a comment

gabikadlecova Nov 6, 2024

aaronkl Nov 6, 2024

gabikadlecova Nov 6, 2024

gabikadlecova left a comment

aaronkl Nov 6, 2024

aaronkl Nov 6, 2024

		self.sub_network_query_groups = sub_network_query_groups
		self.sub_network_head_size = sub_network_head_size

		self.sub_network_head_size: int \| None = self.config.head_size
		self.sub_network_query_groups: int \| None = self.config.n_query_groups

fix: deprecate flexible mlp heads #160

fix: deprecate flexible mlp heads #160

Conversation

rheasukthanker commented Nov 4, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Minimal Example / How should this PR be tested?

Any other comments?

aaronkl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gabikadlecova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rheasukthanker Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gabikadlecova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gabikadlecova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rheasukthanker Nov 6, 2024 •

edited

Loading