Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing flops #35

Open
rebeen opened this issue Apr 16, 2020 · 8 comments
Open

Computing flops #35

rebeen opened this issue Apr 16, 2020 · 8 comments
Labels
question Further information is requested

Comments

@rebeen
Copy link

rebeen commented Apr 16, 2020

please could you answer my questions
Q1- can we computing flops for a model without training the model? is there any relation between flops and training? can training affect flops? when flops can be computed?

I am asking this question because I defined a model and then I computed the flops and here is the results.
Computational complexity: 0.03 GMac
Number of parameters: 2.24 M

Q2- if we want to have flops in million, we should multiply 0.03 * 1000 ? if yes then for this case the computational complexity is 30.0 million.

Q3- what I understand from your code, Mac is flops, am I right ?

Thank you

@sovrasov
Copy link
Owner

A1: Flops can be estimated as soon as the architecture of your model has been defined. For classical training schemes (like when you train ResNet-50 on Imagenet via SGD) the architecture is defined in advance and flops are not changing during the training.

A2: Giga means 10^6, so in your example 0.03 GMac = 30 MMac, you're right.

A3: See #16

@rebeen
Copy link
Author

rebeen commented Apr 16, 2020

Thank you very much, regarding the second answer A2 I think Giga means 10 ^9 based your code so when we change to Million Mac we should GMac* 1000= Milion MAC,

regarding the A3 I have seen these issues but actually still not clear

@sovrasov
Copy link
Owner

Yes, giga is 10^9
If you mean the code, then variables that include flops are actually used to compute macs.

@rebeen
Copy link
Author

rebeen commented Apr 18, 2020

This is the computation cost of mobilenetv2 which I think it is not correct,, what do you think?

import torch
import torch.nn as nn
import torch.nn.functional as F


class Block(nn.Module):
    '''expand + depthwise + pointwise'''
    def __init__(self, in_planes, out_planes, expansion, stride):
        super(Block, self).__init__()
        self.stride = stride

        planes = expansion * in_planes
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, groups=planes, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn3 = nn.BatchNorm2d(out_planes)

        self.shortcut = nn.Sequential()
        if stride == 1 and in_planes != out_planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False),
                nn.BatchNorm2d(out_planes),
            )

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out = out + self.shortcut(x) if self.stride==1 else out
        return out


class MobileNetV2(nn.Module):
    # (expansion, out_planes, num_blocks, stride)
    cfg = [(1,  16, 1, 1),
           (6,  24, 2, 1),  # NOTE: change stride 2 -> 1 for CIFAR10
           (6,  32, 3, 2),
           (6,  64, 4, 2),
           (6,  96, 3, 1),
           (6, 160, 3, 2),
           (6, 320, 1, 1)]

    def __init__(self, num_classes=10):
        super(MobileNetV2, self).__init__()
        # NOTE: change conv1 stride 2 -> 1 for CIFAR10
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(32)
        self.layers = self._make_layers(in_planes=32)
        self.conv2 = nn.Conv2d(320, 1280, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn2 = nn.BatchNorm2d(1280)
        self.linear = nn.Linear(1280, num_classes)

    def _make_layers(self, in_planes):
        layers = []
        for expansion, out_planes, num_blocks, stride in self.cfg:
            strides = [stride] + [1]*(num_blocks-1)
            for stride in strides:
                layers.append(Block(in_planes, out_planes, expansion, stride))
                in_planes = out_planes
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layers(out)
        out = F.relu(self.bn2(self.conv2(out)))
        # NOTE: change pooling kernel_size 7 -> 4 for CIFAR10
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out

net = MobileNetV2()
# def test():
   
#     print(net)
#     x = torch.randn(2,3,32,32)
#     y = net(x)
#     print(y.size())

# test()

from ptflops import get_model_complexity_info

with torch.cuda.device(0):
 
  macs, params = get_model_complexity_info(net, (3, 32, 32), as_strings=True,
                                           print_per_layer_stat=False, verbose=True)
  print('{:<30}  {:<8}'.format('Computational complexity: ', macs))
  print('{:<30}  {:<8}'.format('Number of parameters: ', params))


Warning: module Block is treated as a zero-op.
Warning: module MobileNetV2 is treated as a zero-op.
Computational complexity:       0.09 GMac
Number of parameters:           2.3 M   

@sovrasov
Copy link
Owner

It seems OK to me. mobilenetv2 for Imagenet has different amount of params/macs than mobilenetv2 on CIFAR10. Why do you think this result is not correct?
To verify the result of ptflops you can switch the input resolution of 224 and the number of classes to 1000 in your code and then compare the results against the original MobilenetV2 paper.

@rebeen
Copy link
Author

rebeen commented Apr 18, 2020

I compared that is why I am confused if we look at page 5 in this paper we can see that the highest flops are 42.0 million for mobileNet v2 while if we look at the above result it is 90.0 million flops, also your code does not count block in the code while there are some convolution layer in the block "Warning: module Block is treated as a zero-op.
"
thank you for your replay

@sovrasov
Copy link
Owner

The numbers in this paper are quite weird: that are the differences between MobileNetV2 for CIFAR10 and MobileNetV2 for SVHN? Both the datasets have 32x32 images and 10 classes, so from the architecture perspective MobilNets should be identical in tables 5 and 6 and have the same amount of flops, but the paper reports less flops for SVHN. May be you'd better ask the authors of the paper if there is a difference between the stock MobileNetV2 and their versions.

Regarding warnings: you should treat them carefully. Module block is custom and ptflops doesn't have a rule for it, but at the same it's just a container for other modules that can be parsed correctly. Unfortunately I couldn't figure out a criterion how to distinguish such containers and modules that really need a custom rule to count flops correctly and because of that ptflops just outputs warnings about any unknown module.

@rebeen
Copy link
Author

rebeen commented Apr 19, 2020

Thank you, yes that is really a problem I am confused why these two flops are different,

@sovrasov sovrasov added the question Further information is requested label May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants