Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce results in the paper #14

Open
maestrojeong opened this issue Aug 12, 2021 · 5 comments
Open

Cannot reproduce results in the paper #14

maestrojeong opened this issue Aug 12, 2021 · 5 comments

Comments

@maestrojeong
Copy link

Please refer to the CIFAR-10, Resnet-50 architecture in the table 2 of paper.

When volume budget is 12.5 %, number of parameter is 2.8% and FLOPs is 5.1% in Table 2.

However, the my reproduced result after pruning step is the number of parameter is 12.9% and FLOPs is 16.9 %
while the volume budget is 12.5%.

I follow the hyperparameter setting provided in the appendix.

Would you share train-set validation-set split information and pretrained networks to reproduce the results in the paper??

@rishabh-16
Copy link
Collaborator

rishabh-16 commented Aug 12, 2021

Hi, I believe you are using the visualize_model_architecture function in the utils file to get the parameter/flops ratio. It only shows an approximation in the case of param/flops. We had calculated exact params and flops on pen and paper to write in the paper. However, I have updated the code and you can rerun the code to get accurate numbers.

The train-val split is already given in the data_splits directory.

Our trained model weight for r50 cifar10 volume_ratio 12.5%: https://drive.google.com/file/d/1F4TtIaT0qT76Uz94a-GU0HUrlpZVR7lp/view?usp=sharing.

Please let us know if you have any other queries.

@maestrojeong
Copy link
Author

I reevaluated my model and found out the number of parameter and the number of FLOPs are comparable with the values in the paper.
However, I have another queries in the implementation.

In the code calc_flops(),

ans+=current_max*a[current_loc]*9*size**2 + a[current_loc]*size**2
ans+=a[current_loc]*a[current_loc+1]*9*size**2 + a[current_loc+1]*size**2

It seems you add the FLOPs of convolution operation and FLOPs of batchnorm operation.
In get_flops(), you consider only FLOPs of convolution operation.

The result (activated_FLOPs/total_FLOPs) of calc_flops is about (100M/2.4G). On the other hand, the result of get_flops is about (50M/300M).
When we ignore the FLOPs of batchnorm operation, the value form cal_flops is double of the value from get_flops since
2*ans. However, total flops ratio of two function is 8 times. Where does this big difference come from?

Also, I have a question in this code

current_max = max(downsample_n, a[current_loc+1])

It seems that you consider the number of activated input channels due to skip addition.
However, I believe max operation is not valid.
Suppose the activated channels from convolution operation is [1,0,1,0] and the activated channels from skip addtion is [1,1,0,0].
In this regard, the activated number of input channels should be [1,1,1,0] which is three. On the other hand, max operation yield 2 (=max(2,2)).
Do I have some misunderstandings in this code?

Thank you!

@ubamba98
Copy link
Collaborator

Hi, the get_flops function is just an approximation that we used at pruning time to make the flop calculation. During fine-tuning phase cal_flops is used to get the final thresholds for zeta. Also get_flops uses the soft values of zeta, hence you might be getting these huge difference between the get_flops and cal_flops.

Please refer to branch improved_flops we were currently working on the issue to include correct flop calculation due to skip connection. More elegant way can be found there which you may use. Please do let me know if you have any other queries.

Thanks.

@maestrojeong
Copy link
Author

maestrojeong commented Aug 20, 2021

I found some critical errors in the code.

size = self.insize*2 => size=self.insize

https://github.com/transmuteAI/ChipNet/blob/master/models/resnet.py#L378

This makes the total flop is 20 times bigger than the total flops of get_flops().
You need to revise the number of total FLOPs in the paper.

  1. `self.prev_module' is not correctly considered in the resnet.

@ubamba98
Copy link
Collaborator

Thanks for pointing this out. For ResNet-50, our approximation of the FLOPs might be too crude. We will look into it for further clarification and the scores in the repo, if anything changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants