Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace add_missing_layers with add_missing_container_layers #169

Merged
merged 6 commits into from
Sep 26, 2022
Merged

Replace add_missing_layers with add_missing_container_layers #169

merged 6 commits into from
Sep 26, 2022

Conversation

mert-kurttutan
Copy link
Contributor

@mert-kurttutan mert-kurttutan commented Sep 24, 2022

Hi,

I in this PR, I tried to get rid of add_missing_layer and make summary primarily based on forward call.

There was a problem if you do not include add_missing_layer. It turns out that if you only use forward (i.e. no add_missing_layer), container modules are being ignored and not included in summary list. This leads to one problematic situation due to wrong parameter count. For instance see below result (obtained without add_missing_layer)

===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
GenotypeNetwork                               --                        3,200
├─Sequential: 1-1                             [2, 48, 32, 32]           --
│    └─Conv2d: 2-1                            [2, 48, 32, 32]           1,296
│    └─BatchNorm2d: 2-2                       [2, 48, 32, 32]           96
├─ModuleList: 1                               --                        --
│    └─Cell: 2-3                              [2, 32, 32, 32]           --
│    │    └─ReLUConvBN: 3-1                   [2, 32, 32, 32]           1,600
│    │    └─ReLUConvBN: 3-2                   [2, 32, 32, 32]           1,600
===============================================================================================
Total params: 4,592
Trainable params: 4,592
Non-trainable params: 0
Total mult-adds (M): 8.95
===============================================================================================
Input size (MB): 0.02
Forward/backward pass size (MB): 3.67
Params size (MB): 0.02
Estimated Total Size (MB): 3.71
===============================================================================================

In terms of layer config, there is no problem since this is handled by the function layers_to_str in formatting.py. But, the parameter count is wrong since leftover parameters for the entire GenotypeNetwork is 3,200, but it should be 0 --. This occurs because ModuleList: 1 is not contained in summary_list, and Cell: 2-3 is considered to be children of Sequential: 1-1. To resolve this, I added the function add_missing_container_layers. This adds the container Modules (e.g. ModuleDict or ModuleList) used in main module. Once this is used, the result for the same case is correct:


===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
GenotypeNetwork                               --                        --
├─Sequential: 1-1                             [2, 48, 32, 32]           --
│    └─Conv2d: 2-1                            [2, 48, 32, 32]           1,296
│    └─BatchNorm2d: 2-2                       [2, 48, 32, 32]           96
├─ModuleList: 1-2                             --                        --
│    └─Cell: 2-3                              [2, 32, 32, 32]           --
│    │    └─ReLUConvBN: 3-1                   [2, 32, 32, 32]           1,600
│    │    └─ReLUConvBN: 3-2                   [2, 32, 32, 32]           1,600
│    │    └─ModuleList: 3-3                   --                        --
===============================================================================================
Total params: 4,592
Trainable params: 4,592
Non-trainable params: 0
Total mult-adds (M): 8.95
===============================================================================================
Input size (MB): 0.02
Forward/backward pass size (MB): 3.67
Params size (MB): 0.02
Estimated Total Size (MB): 3.71
===============================================================================================


In addition to making summary based on forward pass, which as far as I can tell is beneficial, this resolves the discrepancies about the ordering in module __init__ and ordering of modules in forward. For instance, after this commit, the following test case gives the intended result. This used to be a problem, to be shown below. Note the order in which modules in self.block0 is defined is given by range_1.


import torch
import torch.nn as nn
import numpy as np

from torchinfo import summary



class RecursiveTest(nn.Module):
    def __init__(self):
        super().__init__()
        self.out_lin0 = nn.Linear(128, 16)

        self.block0 = nn.ModuleDict()
        # range_1 = range(1,4) 
        range_1 = reversed(range(1,4))
        for i in range_1:
            self.block0.add_module(f"in_lin{i}", nn.Linear(16,16)) #nn.ModuleList([nn.Linear(16, 16)]))

        self.block1 = nn.ModuleDict()
        for i in range(4, 7):
            self.block1.add_module(f"in_lin{i}", nn.Linear(16, 16))

        self.out_lin7 = nn.Linear(16, 4)

    def forward(self, x):
        x = torch.relu(self.out_lin0(x))

        for i in range(1, 4):
            x = torch.relu(self.block0[f"in_lin{i}"](x))

        # x = self.block1[f"in_lin{6}"](x)
        # x = self.block0[f"in_lin{2}"](x)

        for i in range(4, 7):
            x = torch.relu(self.block1[f"in_lin{i}"](x))

        x = torch.relu(self.out_lin7(x))

        return x


batch_size = 2
data_shape = (128,)
random_data = torch.rand((batch_size, *data_shape))
my_nn = RecursiveTest()
recursive_summary = summary(
    my_nn, 
    input_data=[random_data], 
    row_settings=('depth', 'var_names'),
    device='cpu',
)

The result before commit (with add_missing_layer included), param count is wrong, it should not be 816:


==========================================================================================
Layer (type (var_name):depth-idx)        Output Shape              Param #
==========================================================================================
RecursiveTest (RecursiveTest)            [2, 4]                    816
├─Linear (out_lin0): 1-1                 [2, 16]                   2,064
├─ModuleDict (block0): 1-2               --                        --
│    └─Linear (in_lin1): 2-1             [2, 16]                   272
│    └─Linear (in_lin2): 2-2             [2, 16]                   272
│    └─Linear (in_lin3): 2-3             [2, 16]                   272
├─ModuleDict (block1): 1                 --                        --
│    └─Linear (in_lin4): 2-4             [2, 16]                   272
│    └─Linear (in_lin5): 2-5             [2, 16]                   272
│    └─Linear (in_lin6): 2-6             [2, 16]                   272
├─Linear (out_lin7): 1-3                 [2, 4]                    68
==========================================================================================
Total params: 3,764
Trainable params: 3,764
Non-trainable params: 0
Total mult-adds (M): 0.01
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.02
Estimated Total Size (MB): 0.02
==========================================================================================

The resultant summary after commit is,:

==========================================================================================
Layer (type (var_name):depth-idx)        Output Shape              Param #
==========================================================================================
RecursiveTest (RecursiveTest)            [2, 4]                    --
├─Linear (out_lin0): 1-1                 [2, 16]                   2,064
├─ModuleDict (block0): 1-2               --                        --
│    └─Linear (in_lin1): 2-1             [2, 16]                   272
│    └─Linear (in_lin2): 2-2             [2, 16]                   272
│    └─Linear (in_lin3): 2-3             [2, 16]                   272
├─ModuleDict (block1): 1-3               --                        --
│    └─Linear (in_lin4): 2-4             [2, 16]                   272
│    └─Linear (in_lin5): 2-5             [2, 16]                   272
│    └─Linear (in_lin6): 2-6             [2, 16]                   272
├─Linear (out_lin7): 1-4                 [2, 4]                    68
==========================================================================================
Total params: 3,764
Trainable params: 3,764
Non-trainable params: 0
Total mult-adds (M): 0.01
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.02
Estimated Total Size (MB): 0.02
==========================================================================================

Note: As you may realize, I used the tracing algorithm in layers_to_str function formatting.py to obtain container modules in add_missing_container_layers.

Looking forward to feedbacks

mert-kurttutan and others added 2 commits September 25, 2022 00:27
Instead, use add_missing_container_layers
@codecov
Copy link

codecov bot commented Sep 24, 2022

Codecov Report

Merging #169 (abd9735) into main (70f3ad1) will decrease coverage by 0.30%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #169      +/-   ##
==========================================
- Coverage   97.39%   97.08%   -0.31%     
==========================================
  Files           6        6              
  Lines         575      584       +9     
==========================================
+ Hits          560      567       +7     
- Misses         15       17       +2     
Impacted Files Coverage Δ
torchinfo/torchinfo.py 97.35% <100.00%> (+0.10%) ⬆️
torchinfo/formatting.py 97.56% <0.00%> (-2.44%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@TylerYep
Copy link
Owner

Thanks for the PR! I'll need some time to look at the code more closely but the general direction is correct - a more specific add_missing_container_layers function helps readability and future debugging a lot.

I left a comment on some of the output changes. Most of them are better but one seems worse (missing MaxPool / PReLU layers, just need some clarification there - are they not used in forward at all? Only used in train mode?)

One pre-commit hook is failing, feel free to use # pylint: disable=unused-variable on that line to ignore it for now.

@TylerYep TylerYep linked an issue Sep 25, 2022 that may be closed by this pull request
@TylerYep TylerYep changed the title Get rid of add_missing_layer Replace add_missing_layers with add_missing_container_layers Sep 25, 2022
@TylerYep
Copy link
Owner

Looks correct to me, thanks for the contribution! I also added some additional tests to ensure it solves the problems it sets out to achieve.

@TylerYep TylerYep merged commit c3188cd into TylerYep:main Sep 26, 2022
@mert-kurttutan
Copy link
Contributor Author

mert-kurttutan commented Sep 26, 2022

Just realized another potential improvement:
Since we are already adding container modules in summary_list, the following problem in layers_to_str in formatting.py seems to be solved. The relevant piece of code in formatting.py

    def layers_to_str(self, summary_list: list[LayerInfo]) -> str:
        """
        Print each layer of the model using a fancy branching diagram.
        This is necessary to handle Container modules that don't have explicit parents.
        """
        new_str = ""

So, I think it is no longer necessary to keep updated hierarchy for parent modules since all the parents (including container modules) are already in the summary list.
What do you think?

@TylerYep
Copy link
Owner

Yep, simplifying that code and making layers_to_str extra simple sounds like a win to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Output incorrect when using nn.ModuleList
2 participants