[Reproduce] Cannot reproduce the results with base MAE #26

GioFic95 · 2023-02-02T17:05:30Z

Hi @Verg-Avesta, I tried to reproduce your pre-training + fine-tuning process, but my results are still different if I use the base MAE model mae_vit_base_patch16, even using the pretrained weights mentioned in issue #6, and even after the fixes suggested in issue #23: I get MAE 13.95 and RMSE 90.25.
On the other hand, if I use the large MAE model mae_vit_large_patch16 I obtain MAE 12.58 and RMSE 87.25, which are closer to the results discussed in the aforementioned issue (MAE: 12.44, RMSE: 89.86), but this isn't mentioned anywhere, as far as I know.

What lets me think that this may be the reason of the difference, besides the fact that the other parameters seems to be the same indicated in the paper or in readme/issues, is the observation that the size of the fine-tuned weights you uploaded on drive (FSC147.pth) is 1.2GB, while the size of my fine-tuned model is ~500MB, as already noticed in issue #7, as far as I can understand using Google Translate.

Other combinations may work as well, e.g. base MAE for pre-training and large MAE for fine-tuning, but I haven't still tried it.

Here are the parameters I used, in case I missed something

for pre-training:

{
    "lr": {
        "desc": null,
        "value": 0.000005
    },
    "blr": {
        "desc": null,
        "value": 0.001
    },
    "seed": {
        "desc": null,
        "value": 0
    },
    "team": {
        "desc": null,
        "value": "wsense"
    },
    "model": {
        "desc": null,
        "value": "mae_vit_base_patch16"
    },
    "title": {
        "desc": null,
        "value": "CounTR_pretraining_paper"
    },
    "wandb": {
        "desc": null,
        "value": "counting"
    },
    "_wandb": {
        "desc": null,
        "value": {
            "t": {
                "1": [
                    1,
                    41,
                    55,
                    63
                ],
                "2": [
                    1,
                    41,
                    55,
                    63
                ],
                "3": [
                    2,
                    13,
                    15,
                    16,
                    23
                ],
                "4": "3.9.15",
                "5": "0.13.9",
                "8": [
                    5
                ]
            },
            "framework": "torch",
            "start_time": 1674927257.097797,
            "cli_version": "0.13.9",
            "is_jupyter_run": false,
            "python_version": "3.9.15",
            "is_kaggle_kernel": false
        }
    },
    "device": {
        "desc": null,
        "value": "cuda"
    },
    "epochs": {
        "desc": null,
        "value": 300
    },
    "gt_dir": {
        "desc": null,
        "value": "gt_density_map_adaptive_384_VarV2"
    },
    "im_dir": {
        "desc": null,
        "value": "images_384_VarV2"
    },
    "min_lr": {
        "desc": null,
        "value": 0
    },
    "resume": {
        "desc": null,
        "value": "./weights/mae_pretrain_vit_base_full.pth"
    },
    "log_dir": {
        "desc": null,
        "value": "None"
    },
    "pin_mem": {
        "desc": null,
        "value": true
    },
    "dist_url": {
        "desc": null,
        "value": "env://"
    },
    "wandb_id": {
        "desc": null,
        "value": null
    },
    "anno_file": {
        "desc": null,
        "value": "annotation_FSC147_384.json"
    },
    "data_path": {
        "desc": null,
        "value": "./data/FSC147/"
    },
    "accum_iter": {
        "desc": null,
        "value": 1
    },
    "batch_size": {
        "desc": null,
        "value": 16
    },
    "local_rank": {
        "desc": null,
        "value": -1
    },
    "mask_ratio": {
        "desc": null,
        "value": 0.5
    },
    "output_dir": {
        "desc": null,
        "value": "./data/out/pretrain"
    },
    "world_size": {
        "desc": null,
        "value": 1
    },
    "dist_on_itp": {
        "desc": null,
        "value": false
    },
    "distributed": {
        "desc": null,
        "value": false
    },
    "num_workers": {
        "desc": null,
        "value": 10
    },
    "start_epoch": {
        "desc": null,
        "value": 0
    },
    "weight_decay": {
        "desc": null,
        "value": 0.05
    },
    "norm_pix_loss": {
        "desc": null,
        "value": false
    },
    "warmup_epochs": {
        "desc": null,
        "value": 10
    },
    "data_split_file": {
        "desc": null,
        "value": "Train_Test_Val_FSC_147.json"
    }
}

and fine-tuning:

{
    "lr": {
        "desc": null,
        "value": 0.00001
    },
    "blr": {
        "desc": null,
        "value": 0.001
    },
    "seed": {
        "desc": null,
        "value": 0
    },
    "team": {
        "desc": null,
        "value": "wsense"
    },
    "model": {
        "desc": null,
        "value": "mae_vit_base_patch16"
    },
    "title": {
        "desc": null,
        "value": "CounTR_finetuning_paper"
    },
    "wandb": {
        "desc": null,
        "value": "counting"
    },
    "_wandb": {
        "desc": null,
        "value": {
            "t": {
                "1": [
                    1,
                    41,
                    55,
                    63
                ],
                "2": [
                    1,
                    41,
                    55,
                    63
                ],
                "3": [
                    2,
                    13,
                    15,
                    16,
                    23
                ],
                "4": "3.9.15",
                "5": "0.13.9",
                "8": [
                    5
                ]
            },
            "framework": "torch",
            "start_time": 1674944766.966494,
            "cli_version": "0.13.9",
            "is_jupyter_run": false,
            "python_version": "3.9.15",
            "is_kaggle_kernel": false
        }
    },
    "device": {
        "desc": null,
        "value": "cuda"
    },
    "epochs": {
        "desc": null,
        "value": 1000
    },
    "gt_dir": {
        "desc": null,
        "value": "gt_density_map_adaptive_384_VarV2"
    },
    "im_dir": {
        "desc": null,
        "value": "images_384_VarV2"
    },
    "min_lr": {
        "desc": null,
        "value": 0
    },
    "resume": {
        "desc": null,
        "value": "./data/out/pretrain/checkpoint__pretraining_299.pth"
    },
    "log_dir": {
        "desc": null,
        "value": "None"
    },
    "pin_mem": {
        "desc": null,
        "value": true
    },
    "dist_url": {
        "desc": null,
        "value": "env://"
    },
    "wandb_id": {
        "desc": null,
        "value": null
    },
    "anno_file": {
        "desc": null,
        "value": "annotation_FSC147_384.json"
    },
    "data_path": {
        "desc": null,
        "value": "./data/FSC147/"
    },
    "accum_iter": {
        "desc": null,
        "value": 1
    },
    "batch_size": {
        "desc": null,
        "value": 8
    },
    "class_file": {
        "desc": null,
        "value": "ImageClasses_FSC147.txt"
    },
    "local_rank": {
        "desc": null,
        "value": -1
    },
    "mask_ratio": {
        "desc": null,
        "value": 0.5
    },
    "output_dir": {
        "desc": null,
        "value": "./data/out/finetune"
    },
    "world_size": {
        "desc": null,
        "value": 1
    },
    "dist_on_itp": {
        "desc": null,
        "value": false
    },
    "distributed": {
        "desc": null,
        "value": false
    },
    "num_workers": {
        "desc": null,
        "value": 10
    },
    "start_epoch": {
        "desc": null,
        "value": 0
    },
    "weight_decay": {
        "desc": null,
        "value": 0.05
    },
    "norm_pix_loss": {
        "desc": null,
        "value": false
    },
    "warmup_epochs": {
        "desc": null,
        "value": 10
    },
    "data_split_file": {
        "desc": null,
        "value": "Train_Test_Val_FSC_147.json"
    }
}

Does it sound reasonable? Maybe you run a fine-tuning with the large MAE?
Thanks in advance

The text was updated successfully, but these errors were encountered:

Verg-Avesta · 2023-02-03T03:07:14Z

Hello, I am also confused why the checkpoints everyone gets are smaller than the ones I provided. I didn't use mae_vit_large_patch16 in finetuning(at least I didn't know if I used it). And according to issue #7, he output the value.size() and value.dtype of the model part in his checkpoint and mine, and found that they are the same. Therefore, I guess it might be due to the different versions of some libraries that the parameters such as optimizer are not saved. You can try to output and compare the parameters in the two checkpoints, and check whether the two are consistent.

For the results, the results in the paper are the best results I got, and I think a MAE between 12~13 is OK for mae_vit_base_patch16. The reproducing result of issue #7 is MAE: 13.89, RMSE: 82.74 and he ran my checkpoints and got a result of MAE: 12.44, RMSE: 89.86. So the results are not very different.

Hope my description can help you find the problem.

GioFic95 mentioned this issue Feb 15, 2023

About hyper parameters #27

Closed

GioFic95 closed this as completed Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Reproduce] Cannot reproduce the results with base MAE #26

[Reproduce] Cannot reproduce the results with base MAE #26

GioFic95 commented Feb 2, 2023

Verg-Avesta commented Feb 3, 2023

[Reproduce] Cannot reproduce the results with base MAE #26

[Reproduce] Cannot reproduce the results with base MAE #26

Comments

GioFic95 commented Feb 2, 2023

Verg-Avesta commented Feb 3, 2023