Reproducibility Problem(ResNet-50 on ImageNet) #9

ildoonet · 2019-12-26T03:11:16Z

Below is the top1 errors with N and M grid search, epoch=180.

M\N	1	2	3
5	0.2318	0.2303	0.2327
7	0.2281	0.2294	0.2323
9	0.2292	0.2289	0.2301
11	0.2264	0.2284	0.2320
13	0.2282	0.2294	0.2294
15	0.2258	0.2265	0.2297

But, I changed epoch from 180(paper's) to 270(autoaugment's), the result with N=2, M=9 are similar to the reported value(top1 error = 22.4).

ildoonet · 2019-12-30T03:23:10Z

According to the author's email reply, I changed some parameters.

With epoch 180 (paper(RandAugment)'s epoch)

M\N	1	2	3
5	0.2284	0.2298	0.2321
7	0.2288	0.2286	0.2298
9	0.2286	0.2287	0.2308
11	0.2262	0.2265	0.2316
13	0.2283	0.2264	0.2299
15	0.2286	0.2264	0.2304

The performance with N=2, M=9 (reported optimal value) was not matched to the reported performance(top1 error = 22.4). Also there is no model that performs as well as the paper claims, even though we evaluated all models in the same space.

With epoch 270 (AutoAugment's epoch)

M\N	1	2	3
5	0.2271	0.2284	0.2312
7	0.2271	0.2287	0.2263
9	0.2262	0.2287	0.2286
11	0.2255	0.2253	0.2276
13	0.2241	0.2250	0.2275
15	0.2224	0.2246	0.2271

If we increase the training epoch to 270, one of the results is outperforming 'AutoAugment' and 'RandAugment'.

BarretZoph · 2020-01-09T05:23:51Z

Have you tried using the randuagment code in https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet? This was the code used for training the ResNet model. Also we are almost done opensourcing the ResNet model.

ildoonet · 2020-01-10T02:34:18Z

https://github.com/tensorflow/tpu/blob/8462d083dd89489a79e3200bcc8d4063bf362186/models/official/efficientnet/autoaugment.py#L663

@BarretZoph Is this the code you mentioned? Some parts are mismatched with this code, I will change them and conduct experiments soon.

ildoonet · 2020-01-10T03:14:28Z

@BarretZoph Is this correct?

https://github.com/tensorflow/tpu/blob/8462d083dd89489a79e3200bcc8d4063bf362186/models/official/efficientnet/autoaugment.py#L181

Since SolarizeAdd have a addition parameter with "0"(zero), this doesn't affect the image.

BarretZoph · 2020-01-10T19:19:31Z

Yes that is the code I mentioned. Also the code for ResNet-50 will be opensourced soon!

No I believe that is that the magnitude hyperparameter control. It is the threshold value that does not change.

ildoonet · 2020-01-12T02:35:28Z

@BarretZoph Thanks for opensourcing. I really look forward to it!

Also for SolarizeAdd, sorry for my misunderstanding. I will update this repo and start new experiments. Thanks.

BarretZoph · 2020-01-12T03:58:43Z

Great! Let me know how everything goes. The open sourcing will be done in a week or two!

ildoonet · 2020-01-15T03:24:45Z

I need to examine more since the performance doesn't match after I change augmentation search space as RandAugment's code. Best top1 error is 22.65 by training 180 epochs.

cc @BarretZoph

	1	2	3
5	0.2318	0.2334	0.2371
7	0.2272	0.2330	0.2365
9	0.2287	0.2295	0.2338
11	0.2279	0.2287	0.2352
13	0.2285	0.2265	0.2337
15	0.2262	0.2280	0.2315

BarretZoph · 2020-01-15T23:47:57Z

Hmm well the code in https://github.com/tensorflow/tpu/tree/master/models/official/resnet will be opensourced shortly. Hopefully that will resolve all of your issues.

ildoonet · 2020-01-16T05:45:04Z

@BarretZoph Thanks, I will look into your opensourced codes.

By checking codes provided by you, I'm not sure what is different. Below items are few things that might ruin the performance.

In your paepr, "The image size was 224 by 244" seems to be a typo. Image size should be 224 by 224, right?
Is this the base configuration you used? : https://github.com/tensorflow/tpu/blob/master/models/official/resnet/configs/resnet_config.py
- Parameters(learning rate, epochs, ...) are different than the paper. (I guess you modified this before you trained the model)
- dropblock is used, which is not in this repo.
- precision is fp16, where mine uses fp32.
- do you use Lars optimizer when you increase batch size(eg 4k)?

I guess if you opensource your codes as well as the configuration, it would be very helpful. Many Thanks!

BarretZoph · 2020-01-16T19:29:10Z

Thanks I just fixed the image size in the paper! Yes it should be 224x224.

Yes that is the base config but a few things were changed (180 epochs and batch size of 4096 with 32 replicas).

Dropblock is actually not used in this. Since 'dropblock_groups': '' there are no groups specified, so it will not be applied. https://github.com/tensorflow/tpu/blob/master/models/official/resnet/resnet_main.py#L88

I believe that changing that changing the precision will not make much of a difference.

No LARS is not used with the 4K batch size.

HobbitLong · 2020-01-21T03:02:27Z

Hi, @ildoonet ,

Thanks for contributing this nice code! I wonder if the performance gap is related to two possible misalignments?
(a) Seems you put RandAugment before RandomResizedCrop. If I understand correctly, the original Tensorflow repo puts RandAugment after RandomResizedCrop.
(b) Seems some the black pixels, which are originally outside of the image but now inside the crop due to transformations such as ShearX, should be filled with some value like 128 or pixel mean using the fillcolor parameter?

These two gaps are my impression when I came across both repos, but I am not sure if it's true, or if it is true, how would the performance be affected.

ildoonet · 2020-01-21T03:38:29Z

@HobbitLong Thanks! They are surely things that can cause differences. As @BarretZoph mentioned, tensorflow's randaugment will be open this week or next, I will examine the code and share what was the problem.

BarretZoph · 2020-01-23T03:24:42Z

The code is now updated with randaugment and autoaugment: https://github.com/tensorflow/tpu/tree/master/models/official/resnet

ildoonet · 2020-01-28T03:38:11Z

@BarretZoph Thanks for the update. It will help a lot.

I found that the preprocessing part for resnet/efficientnet implemented by google is a bit different than the one most people use. It regularize harder by using smaller cropping region when it trains the model, and It use center-cropped image keeping same aspect-ratio when it tests the model.

I believe that this discrepancy cause some degradations. I will try it with this preprocessing.

Pytorch's most favored random-crop preprocessing : https://pytorch.org/docs/stable/_modules/torchvision/transforms/transforms.html#RandomResizedCrop
Yours : https://github.com/tensorflow/tpu/blob/master/models/official/resnet/resnet_preprocessing.py#L86
- https://github.com/tensorflow/tensorflow/blob/9274bcebb31322370139467039034f8ff852b004/tensorflow/core/kernels/sample_distorted_bounding_box_op.cc#L111

ildoonet · 2020-01-31T14:37:42Z

I fix the above problem and got 23.22 top1 error with resnet50 and the N=2, M=9.

I guess there are more things to match but due to my current work, I will have to look into this after few weeks.

iamhankai · 2020-03-31T05:17:57Z

Any news?

PistonY · 2020-11-24T06:49:29Z

Any news?

A-Telfer · 2021-02-08T20:29:10Z

Perhaps related to the issue I just posted on lower magnitudes leading to greater distortions? #24

shuguang99 · 2023-08-12T12:25:14Z

Has the code Been updated for imagine reproduction?

ildoonet self-assigned this Dec 26, 2019

ildoonet added a commit that referenced this issue Dec 30, 2019

#9 : code changes for imagenet(resnet) reproducibility

6949784

ildoonet added a commit that referenced this issue Jan 13, 2020

#9 : fix augmentation search space

8ab84e4

ildoonet mentioned this issue Jan 19, 2020

Reproducing CIFAR-10 results #12

Closed

A-Telfer mentioned this issue Feb 8, 2021

Low distortion for low magnitudes #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility Problem(ResNet-50 on ImageNet) #9

Reproducibility Problem(ResNet-50 on ImageNet) #9

ildoonet commented Dec 26, 2019 •

edited

Loading

ildoonet commented Dec 30, 2019 •

edited

Loading

BarretZoph commented Jan 9, 2020

ildoonet commented Jan 10, 2020

ildoonet commented Jan 10, 2020

BarretZoph commented Jan 10, 2020

ildoonet commented Jan 12, 2020

BarretZoph commented Jan 12, 2020

ildoonet commented Jan 15, 2020

BarretZoph commented Jan 15, 2020

ildoonet commented Jan 16, 2020 •

edited

Loading

BarretZoph commented Jan 16, 2020

HobbitLong commented Jan 21, 2020 •

edited

Loading

ildoonet commented Jan 21, 2020

BarretZoph commented Jan 23, 2020

ildoonet commented Jan 28, 2020

ildoonet commented Jan 31, 2020 •

edited

Loading

iamhankai commented Mar 31, 2020

PistonY commented Nov 24, 2020

A-Telfer commented Feb 8, 2021

shuguang99 commented Aug 12, 2023

Reproducibility Problem(ResNet-50 on ImageNet) #9

Reproducibility Problem(ResNet-50 on ImageNet) #9

Comments

ildoonet commented Dec 26, 2019 • edited Loading

ildoonet commented Dec 30, 2019 • edited Loading

With epoch 180 (paper(RandAugment)'s epoch)

With epoch 270 (AutoAugment's epoch)

BarretZoph commented Jan 9, 2020

ildoonet commented Jan 10, 2020

ildoonet commented Jan 10, 2020

BarretZoph commented Jan 10, 2020

ildoonet commented Jan 12, 2020

BarretZoph commented Jan 12, 2020

ildoonet commented Jan 15, 2020

BarretZoph commented Jan 15, 2020

ildoonet commented Jan 16, 2020 • edited Loading

BarretZoph commented Jan 16, 2020

HobbitLong commented Jan 21, 2020 • edited Loading

ildoonet commented Jan 21, 2020

BarretZoph commented Jan 23, 2020

ildoonet commented Jan 28, 2020

ildoonet commented Jan 31, 2020 • edited Loading

iamhankai commented Mar 31, 2020

PistonY commented Nov 24, 2020

A-Telfer commented Feb 8, 2021

shuguang99 commented Aug 12, 2023

ildoonet commented Dec 26, 2019 •

edited

Loading

ildoonet commented Dec 30, 2019 •

edited

Loading

ildoonet commented Jan 16, 2020 •

edited

Loading

HobbitLong commented Jan 21, 2020 •

edited

Loading

ildoonet commented Jan 31, 2020 •

edited

Loading