How to finetune from pretrained detectron models with different number of classes? #15

wangg12 · 2018-10-25T14:37:03Z

❓ Questions and Help

Is there a config option to load pretrained coco models for finetuning? The last layers where the number of classes may be different, so those weights should not be loaded.

fmassa · 2018-10-25T14:53:22Z

Hi,

There currently isn't an off-the-shelf option in the config for that.
I see two easy options:
1 - from a python interpreter, load the pre-trained files that you want to use, and delete from the state_dict the keys corresponding to the last layer. The exact naming depends on the model architecture, but for boxes the name will end with a cls_score and bbox_pred, and for masks it will end with mask_fcn_logits.
2 - Clone the code-base and modify the names of the two variables that I pointed out to be something else, like cls_score_mine etc. This will work out of the box, and you can modify the NUM_CLASSES in the config without clashes.

I think we could provide a functionality to perform 1 for the users, given a cfg file and a path to a model weight. That could be a possible improvement on top of what we currently have.

What do you think?

wangg12 · 2018-10-25T15:24:02Z

@fmassa I think option 1 is more user-friendly. We can add a config option like PRETRAINED_DETECTRON_WEIGHTS and if it is given all the weights but those of the last layer would be loaded to initilize the model.

fmassa · 2018-10-25T15:59:54Z

Yeah, option 1 is definitely simpler for the user (even if there are only a few lines to change here and there ;-) )

I'll prepare a PR adding support for this functionality, but I'm not 100% sure of what the API should look like, nor the best fix for it.

API

Should we have a function that acts on the weights and creates a new set of weights file? Or should we add an extra config argument, to make it a single step function? If we add an argument (which seems simpler for the user), would it be ambiguous?

Implementation

For the possible fixes, we could hard-code the possible names for the layers that shouldn't be loaded (as I mentioned before). But this is not super robust if the user changes their module names (which they can, if they want).

Another possible implementation is to not load the weights for the entire predictor. This is effectively the most robust way, as the predictor was designed to be only the "last layer".
This works nicely for boxes, but for masks we would lose one ConvTranspose2d layer initialization as well, which might be that bad in the end.

Thoughts?

wangg12 · 2018-10-25T17:57:49Z

I would prefer the former way. For possible module name changes by users, I think they should also be careful for weights loading, either by name remapping or random initialization.

fmassa · 2018-10-25T18:11:42Z

@wangg12 could you expand on why you'd prefer the first approach? I was actually leaning more towards the second one, as it is more robust, and we have a clear contract with the user when we add an option to the config: "load every weight possible, except those in the predictor".

wangg12 · 2018-10-25T18:20:33Z

@fmassa There are two conditions where the first one may be more suitable.

I just want to finetune the trained coco model on coco datasets.
I want to use pretrained weights as much as I can, so the lost convtranspose2d weights may be unexpected.
For other conditions, I think the second way is also OK.

fmassa · 2018-10-25T19:02:30Z

So, I've discussed with a few people here and it seems that the best way of handling this would be to actually perform model surgery on the model files.

For example, the best results on CityScapes come from taking a COCO trained detector, then remove most of the classification and mask weights, but retaining those that correspond to common categories between both COCO and CityScapes.
Detectron does something as follows: https://github.com/facebookresearch/Detectron/blob/master/tools/convert_coco_model_to_cityscapes.py , so maybe the most generic thing to do is to provide a few helper functions for users to decide which layers to trim.

wangg12 · 2018-10-25T19:19:45Z

Yes, this way is more general.

xuanyuzhou98 · 2018-10-30T21:49:13Z

"load the pre-trained files that you want to use, and delete from the state_dict"

Hi,

There currently isn't an off-the-shelf option in the config for that.
I see two easy options:
1 - from a python interpreter, load the pre-trained files that you want to use, and delete from the state_dict the keys corresponding to the last layer. The exact naming depends on the model architecture, but for boxes the name will end with a cls_score and bbox_pred, and for masks it will end with mask_fcn_logits.
2 - Clone the code-base and modify the names of the two variables that I pointed out to be something else, like cls_score_mine etc. This will work out of the box, and you can modify the NUM_CLASSES in the config without clashes.

I think we could provide a functionality to perform 1 for the users, given a cfg file and a path to a model weight. That could be a possible improvement on top of what we currently have.

What do you think?

Where are the pretrained files located? For example, I want to use pretrained net in imageset, wheere can we find those files and load them?

fmassa · 2018-10-31T09:09:10Z

By default, they are stored in ~/.torch/models. The exact name of the file is printed during training, just before the printing of the loaded weights.

steve-goley · 2018-10-31T12:17:10Z

I added this function to train_net.py with an additional input arg. Note, the loaded models had an additional "module." prefix that had to be removed. After I removed this it worked great.

def _transfer_pretrained_weights(model, pretrained_model_pth):
    pretrained_weights = torch.load(pretrained_model_pth)['model']
    new_dict = {k.replace('module.',''):v for k, v in pretrained_weights.items()
                if 'cls_score' not in k and 'bbox_pred' not in k}
    this_state = model.state_dict()
    this_state.update(new_dict)
    model.load_state_dict(this_state)
    return model

I don't think this is the solution that @fmassa wants to implement but it'll work in a pinch for now.

cppntn · 2018-11-05T14:26:58Z

Hello @steve-goley @fmassa , I've tried to load the pretrained model in this way:
w = torch.load("X-101-32x8d.pkl")

however, an error occured: UnicodeDecodeError: 'ascii' codec can't decode byte 0xad in position 2: ordinal not in range(128)
I am able to get over this errore by doing, with pickle:
with open("X-101-32x8d.pkl", "rb") as f: w = pickle.load(f, encoding='latin1')

But it seems to be no "model" key in the dict, just "blobs" dict and I can't find 'cls_score' and 'bbox_pred'.

Could you tell me how to overcome this issue?

Thanks

fmassa · 2018-11-05T14:45:53Z

@antocapp the .pkl files are generally from the Detectron codebase, which is written in Caffe2.

What I'd recommend doing is the following:
1 - create a cfg object similar to what is present in the demo, for that particular model
2 - use load_c2_format function, which will give you a dict containing the model field. In there, you can perform the model surgery that you want, by removing fields etc
3 - save the object using pytorch torch.save, keeping the structure dict(model=state_dict).
4 - change MODEL.WEIGHT to point to this saved file.

Let me know if it doesn't work, I might have missed a step here.

cppntn · 2018-11-05T15:06:54Z

Hi @fmassa, thanks for your support.
I wrote this:

from maskrcnn_benchmark.config import cfg
from maskrcnn_benchmark.utils.c2_model_loading import load_c2_format

cfg.merge_from_file("configs/caffe2/e2e_mask_rcnn_X_101_32x8d_FPN_1x_caffe2.yaml")
path = '/home/antonio/.torch/models/X-101-32x8d.pkl'
_d = load_c2_format(cfg, path)

keys = [k for k in _d['model'].keys()]
print(sorted(keys))

But i can't find 'cls_score' and 'bbox_pred' in the keys.

fmassa · 2018-11-05T15:50:06Z

@antocapp you are loading the ImageNet-trained models (X-101-32x8d.pkl), not the detection models that have already been trained on COCO (which is probably what you want). The model file that you are looking for has a long name, should start with _ and parts of it are here.

cppntn · 2018-11-06T08:35:23Z

Thanks @fmassa, so where I can find that model? When i performed inference with that model it works very well (I want just to fine tune it on a class on a specific dataset) but in .torch/models/ i see that only "X-101-32x8d.pkl" has been downloaded. Where i can find the detection model?

Thanks for your help i really appreciate that

EDIT: I launched again inference and it started downloading again the file 36761843/12_2017_baselines/e2e_mask_rcnn_X-101-32x8d-FPN_1x.yaml.06_35_59.RZotkLKI/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl ; maybe I accidentally deleted the previous model from models/ folder. Thanks again!
I was able to prune 'cls_score' and 'bbox_pred' layers in the model, then saved it keeping the key 'model' in .pth with torch.save. Then i changed MODEL.WEIGHT to point to this file and ROI_BOX_HEAD.NUM_CLASSES to 2 (background and the only one class that i want to fine tune the model for). Is this correct?

A last question: how should I organize my dataset in order to fine tune the model?

BelhalK · 2018-11-23T18:01:27Z

Hi @antocapp,
Could you share your chunk of code that takes the pre trained mask rcnn model (beginning with _) and returns the modified one please (prunning the relevant fields)?
I am running into the same issues you mentionned in

Hello @steve-goley @fmassa , I've tried to load the pretrained model in this way:
w = torch.load("X-101-32x8d.pkl")

however, an error occured: UnicodeDecodeError: 'ascii' codec can't decode byte 0xad in position 2: ordinal not in range(128)
I am able to get over this errore by doing, with pickle:
with open("X-101-32x8d.pkl", "rb") as f: w = pickle.load(f, encoding='latin1')

But it seems to be no "model" key in the dict, just "blobs" dict and I can't find 'cls_score' and 'bbox_pred'.

Could you tell me how to overcome this issue?

Thanks

Thank you very much

fmassa · 2018-11-23T18:03:40Z

@BelhalK the weights are inside blobs, but they have some pretty different names.

BelhalK · 2018-11-23T20:45:49Z

Got it. So the working function should be

def _transfer_pretrained_weights(model, pretrained_model_pth):
    pretrained_weights = torch.load(pretrained_model_pth)['**blobs**']
    new_dict = {k.replace('module.',''):v for k, v in pretrained_weights.items()
                if '**somethingelse**' not in k and '**somethingelse**' not in k}
    this_state = model.state_dict()
    this_state.update(new_dict)
    model.load_state_dict(this_state)
    return model

Where somethingelse should be different than cls_score and bbox_pred, right?

fmassa · 2018-11-23T20:49:14Z

Almost, you'll probably need to plug it somewhere in utils/c2_loading

BelhalK · 2018-11-23T20:57:13Z

you may be right.
I initially wanted to insert it in tools/train_net.py
like

def _transfer_pretrained_weights(model, pretrained_model_pth):
    pretrained_weights = torch.load(pretrained_model_pth)['model']
    new_dict = {k.replace('module.',''):v for k, v in pretrained_weights.items()
                if 'cls_score' not in k and 'bbox_pred' not in k}
    this_state = model.state_dict()
    this_state.update(new_dict)
    model.load_state_dict(this_state)
    return model


def train(cfg, local_rank, distributed):
    old_model = build_detection_model(cfg)
    pretrained_model_pth = "/home/belhal/.torch/models/_detectron_35858933_12_2017_baselines_e2e_mask_rcnn_R-50-FPN_1x.yaml.01_48_14.DzEQe4wC_output_train_coco_2014_train%3Acoco_2014_valminusminival_generalized_rcnn_model_final.pkl"
    model = _transfer_pretrained_weights(old_model,pretrained_model_pth)
    device = torch.device(cfg.MODEL.DEVICE)
    model.to(device)
   ....

But it may be necessary in some other scripts

BelhalK · 2018-11-24T16:22:26Z

I have been using the different tips and tricks of this thread to modify a pre-trained model.
I am having an issue saving the modified dict into a new model.
I am using the following code

path='/Users/belhal/.torch/models/_detectron_35858933_12_2017_baselines_e2e_mask_rcnn_R-50-FPN_1x.yaml.01_48_14.DzEQe4wC_output_train_coco_2014_train%3Acoco_2014_valminusminival_generalized_rcnn_model_final.pkl'
from maskrcnn_benchmark.utils.c2_model_loading import load_c2_format

cfg.merge_from_file("../configs/e2e_mask_rcnn_X_101_32x8d_FPN_1x.yaml")
_d = load_c2_format(cfg, path)
newdict = _d

def removekey(d, listofkeys):
    r = dict(d)
    for key in listofkeys:
        del r[key]
    return r

newdict['model'] = removekey(_d['model'], ['cls_score.bias','cls_score.weight','bbox_pred.bias','bbox_pred.weight'])

How should I use torch.save(??, 'mymodel.pkl')to save a new model named mymodel.pkl with the resulting dict newdict?

Thanks a lot for your help!

fmassa · 2018-11-27T09:51:04Z

You can just save it using torch.save(newdict, 'mymodel.pth'). Note the pth extension, and not pkl

fmassa · 2018-12-14T10:26:20Z

@jbitton addressed your question in #273

Also, given that the current issues were not enough to give you full context on how to add new datasets, could you perhaps improve a bit the documentation in https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/maskrcnn_benchmark/data/README.md (maybe adding a link from the main README as well) with the points that were missing, and send a PR?

It would be a very welcome contribution!

jbitton · 2018-12-14T10:33:43Z

@fmassa For sure! Do you mind if I get the PR out mid-next week? I'd like to first verify that I was able to go through the training/eval scripts successfully.

fmassa · 2018-12-14T10:34:30Z

@jbitton sure, no worries! thanks a lot!

mattans · 2018-12-14T11:23:39Z

What's the meaning of %3A in the saved path? It's the HTML code for a colon, but why do we want it in a path?

fmassa · 2018-12-14T12:23:17Z

@mattans we don't necessarily want it in the path. But this might be specific to what Windows can have as characters in a path

wangg12 · 2018-12-18T03:13:02Z

To summarize, I've created a script tools/trim_detectron_model.py here.
You can decide which keys to be removed and which keys to be kept by modifying the script.

Then you can simply point the converted model path in the config file by changing MODEL.WEIGHT.

fmassa · 2018-12-18T10:27:51Z

@wangg12 could you maybe add a section in the TROUBLESHOOTING or in the README pointing to your snippet and send a PR?

Thanks!

wangg12 · 2018-12-18T11:08:49Z

@fmassa I've created a PR #286

xiaohai12 · 2019-05-29T18:07:05Z

I had a question about using trim_detectron_model.py.
If I understand correctly, when we load model by using load_c2_format(cfg, path), this function can only work with .pkl file . However, what we save from training is .pth file, so I had a error when I wanted to use trim_detectron_model.py. for .pth file.

Is there any solution for this?
Thanks.

christopherbate · 2019-05-29T22:58:57Z

@xiaohai12 I believe you can just replace the call to load_c2_format with a simple torch.load, but I have not tested.

xiaohai12 · 2019-05-30T07:36:39Z

@xiaohai12 I believe you can just replace the call to load_c2_format with a simple torch.load, but I have not tested.

Thanks. I will try it.

xiaohai12 · 2019-06-04T08:31:39Z

@xiaohai12 I believe you can just replace the call to load_c2_format with a simple torch.load, but I have not tested.

It worked in my case when I modified load_c2_format to torch.load and modified the the parameters in removekey from cls_score to roi_heads.box.predictor.cls_socre(same for other parameters).

fmassa added the enhancement New feature or request label Oct 25, 2018

youngkyoonjang mentioned this issue Oct 26, 2018

Segmentation fault (core dumped) #21

Closed

fmassa mentioned this issue Nov 5, 2018

Train on my own dataset with only one class #112

Closed

fmassa mentioned this issue Nov 6, 2018

My dataset only have two classess, what should i tune? #118

Closed

This was referenced Nov 14, 2018

Training on my own dataset #159

Closed

Train on Open Images Dataset #161

Closed

Train on cityscapes #168

Closed

wangg12 closed this as completed Dec 18, 2018

This was referenced Dec 18, 2018

add finetune from detectron guide #285

Closed

add finetune from detectron guide #286

Merged

wangg12 pushed a commit to wangg12/maskrcnn-benchmark that referenced this issue Dec 18, 2018

reference to issue facebookresearch#15

eb1aea6

ranjiewwen mentioned this issue Dec 20, 2018

trian cityscapes use coco pretrain model problem ? #259

Open

fmassa mentioned this issue Dec 20, 2018

Explanation on using custom datasets + pretraining #297

Closed

nprasad2021 mentioned this issue Jan 11, 2019

boxes = torch.as_tensor(boxes).reshape(-1, 4) valueError #334

Closed

fmassa mentioned this issue Feb 18, 2019

Advantages compared to tensor-flow version Mask-RCNN #449

Open

AdanMora mentioned this issue Mar 1, 2019

Step-by-step tutorial - How to train your own dataset #521

Open

zjhuang22 mentioned this issue Apr 5, 2019

how to add other dataset? zjhuang22/maskscoring_rcnn#30

Open

HappyKerry mentioned this issue Apr 11, 2019

[WIP] Tracing / Scripting #138

Closed

dedoogong mentioned this issue Aug 16, 2019

RuntimeError: "SigmoidFocalLoss_forward" not implemented for 'Half' #1048

Open

Dorozhko-Anton mentioned this issue Aug 20, 2019

FBNet Pre-trained Weights for a Custom Dataset #1050

Open

This was referenced Oct 9, 2019

Training on my own data set (1 class) #1120

Open

how to train the mask rcnn with only one motorbike class instead of 20 classes(for Pascal2007 dataset ) #1121

Open

zhxgj mentioned this issue Dec 3, 2019

Is there a plan to open source the trained weights of MaskRCNN? ibm-aur-nlp/PubLayNet#8

Closed

Jacobew mentioned this issue Apr 19, 2020

add dcn from mmdetection #693

Merged

4daJKong mentioned this issue Feb 24, 2023

Is it necessary to delete these parameters to fine-tune model when the number of classes changed? #1354

Open

How to finetune from pretrained detectron models with different number of classes? #15

How to finetune from pretrained detectron models with different number of classes? #15

Comments

wangg12 commented Oct 25, 2018

❓ Questions and Help

fmassa commented Oct 25, 2018

wangg12 commented Oct 25, 2018 • edited Loading

fmassa commented Oct 25, 2018

API

Implementation

wangg12 commented Oct 25, 2018

fmassa commented Oct 25, 2018

wangg12 commented Oct 25, 2018

fmassa commented Oct 25, 2018

wangg12 commented Oct 25, 2018

xuanyuzhou98 commented Oct 30, 2018

fmassa commented Oct 31, 2018

steve-goley commented Oct 31, 2018 • edited Loading

cppntn commented Nov 5, 2018

fmassa commented Nov 5, 2018

cppntn commented Nov 5, 2018 • edited Loading

fmassa commented Nov 5, 2018

cppntn commented Nov 6, 2018 • edited Loading

BelhalK commented Nov 23, 2018

fmassa commented Nov 23, 2018

BelhalK commented Nov 23, 2018

fmassa commented Nov 23, 2018

BelhalK commented Nov 23, 2018

BelhalK commented Nov 24, 2018

fmassa commented Nov 27, 2018

fmassa commented Dec 14, 2018

jbitton commented Dec 14, 2018

fmassa commented Dec 14, 2018

mattans commented Dec 14, 2018

fmassa commented Dec 14, 2018

wangg12 commented Dec 18, 2018 • edited Loading

fmassa commented Dec 18, 2018

wangg12 commented Dec 18, 2018

xiaohai12 commented May 29, 2019 • edited Loading

christopherbate commented May 29, 2019

xiaohai12 commented May 30, 2019 • edited Loading

xiaohai12 commented Jun 4, 2019

wangg12 commented Oct 25, 2018 •

edited

Loading

steve-goley commented Oct 31, 2018 •

edited

Loading

cppntn commented Nov 5, 2018 •

edited

Loading

cppntn commented Nov 6, 2018 •

edited

Loading

wangg12 commented Dec 18, 2018 •

edited

Loading

xiaohai12 commented May 29, 2019 •

edited

Loading

xiaohai12 commented May 30, 2019 •

edited

Loading