Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add resnet50 example #3266

Merged
merged 16 commits into from
Aug 3, 2016
Merged

add resnet50 example #3266

merged 16 commits into from
Aug 3, 2016

Conversation

MoyanZitto
Copy link
Contributor

This is a Keras implmentation of Kaiming He's residual network (50 layers).
The layers have been properly named so that it would be easy if anyone want to load the pretrained weights converted from Kaiming he's caffemodel file.

@giorgiop
Copy link
Contributor

Now sure if that's still active, but #2793 is on Resnet as well.

@MoyanZitto
Copy link
Contributor Author

@giorgiop yes I've check the issue you mentioned, but I think these 2 scripts are not same. My commit is exactly what Kaiming He published in his github, acturally I have converted the pretrained caffemodel provided by Kaiming He to Keras h5 file. Once I finished my tests, I'll update this scripts so that people could get pre-trained resnet50 model directly from keras examples.
I think this would be helpful to many researchers.
Thank you!

This script get 10 points in PEP8 tests on my computer but.....
@fchollet
Copy link
Collaborator

acturally I have converted the pretrained caffemodel provided by Kaiming He to Keras h5 file

In that case I think you should provide the Keras weights, and in your script demonstrate how to load the pre-trained weights and run inference on some images. It would be a great addition.

@fchollet
Copy link
Collaborator

One thing to consider would be to provide two version of the weights file: one for Theano and one for TensorFlow, since they differ: https://github.com/fchollet/keras/wiki/Converting-convolution-kernels-from-Theano-to-TensorFlow-and-vice-versa

@MoyanZitto
Copy link
Contributor Author

@fchollet Ok, I'll do it as soon as possible~

@MoyanZitto
Copy link
Contributor Author

@fchollet Hey I have finished my test on this script, it works well, this is a screen shot from my IPython:
res50

In the comments at the top of this script, I release the address where people can download pretrianed h5 file. This weight file is only for tensorflow backend for now. I tried to convert it to theano backend version but failed. (The weights could be loaded but the test result is incorrect). Maybe you can just merge this PR and I'll keep trying.

And, since gist is blocked by Great Wall in China, the converted weights were uploaded on "Baidu" cloud drive. Maybe someone could donwload from baidu cloud drive and then upload it to gist. I think more or little, the Chinese words on baidu cloud drive is annoying to those tho speak English.

Thank you! I'm goint to fix the endless PEP8 problems...

@fchollet
Copy link
Collaborator

This weight file is only for tensorflow backend for now. I tried to convert it to theano backend version but failed. (The weights could be loaded but the test result is incorrect). Maybe you can just merge this PR and I'll keep trying.

Can you clarify what you did and what went wrong?

The only difference between Theano and TensorFlow is the fact that TensorFlow uses flipped kernels (because it does correlation, not convolution) in Convolution2D kernels. Simply iterating over all convolution layers and flipping the kernels is enough to convert the weights file. There are no other differences.

@fchollet
Copy link
Collaborator

Btw the link you provide does not work. 啊哦,你所访问的页面不存在了。

@MoyanZitto
Copy link
Contributor Author

@fchollet Good afternoon, here's how I transfer tf weights to th weights:

from keras import backend as K
from keras.utils.np_utils import convert_kernel
import h5py


f_th = h5py.File('thresnet50.h5','w')
f_tf = h5py.File('resnet50.h5','r')

for k in f_tf.keys():
    grp = f_th.create_group(k) # create group fpr each layer
    if k[:3]=='res' or k[:4]=='conv': #which means it is a conv layer
        grp.create_dataset('weights',data=convert_kernel(f_tf[k]['weights'][:])) # for conv layer, call convert_kernel to transfer weight into th

    else:
        grp.create_dataset('weights',data=f_tf[k]['weights'][:]) # else just keep it still

    grp.create_dataset('bias',data=f_tf[k]['bias'][:]) # store the bias term
f_th.close()
f_tf.close()

Basically th weights is just a copy of tf weights, with the only exception that for conv layer, the weights is converted by convert_kernel.

After this transformation, I swich backend to Theano and load this transfered weights. But the prediction result is incorrect, both two test images were predicted to be "n02443485 black-footed ferret, ferret, musterla nigrips"

I have to admit that I didn't spend much time on transfering the weights, maybe I should be more careful. Perhaps there will be some good news when you wake up tomorrow.

BTW, I can visit the link and download weights normally. If it doesn't work for you, I'll try to upload it to elsewhere. Microsoft OneDrive could be a good choice.

Thank you~

@MoyanZitto
Copy link
Contributor Author

Hi @fchollet , a good news and a bad one.
The good news is I use model.save_weights() to resave the pretrained weights, now we can just use 'model.load_weights()' to load the weights.

The bad news is that I use the code in https://github.com/fchollet/keras/wiki/Converting-convolution-kernels-from-Theano-to-TensorFlow-and-vice-versa , and then use save_weights like this:

from keras import backend as K
from keras.utils.np_utils import convert_kernel
import res_net50
import h5py

model = res_net50.get_resnet50()
model.load_weights('tf_resnet50.h5')
for layer in model.layers:
   if layer.__class__.__name__ in ['Convolution1D', 'Convolution2D']:
      original_w = K.get_value(layer.W)
      converted_w = convert_kernel(original_w)
      K.set_value(layer.W, converted_w)
model.save_weights('th_resnet50.h5')

This is the easiest solution I can figure out, but after run this script, switch backend to theano and load 'th_resnet50.h5' we just generated, the test result is still not correct. ('n02443485 black-footed ferret, ferret, musterla nigrips' for both test images)

Perhaps the difference between th and tf is bigger than we expected. And I think this difference would be the souce of many unexpected bugs.

I've updated the links. Now you can download the weights from Google drive.

@@ -0,0 +1,218 @@
'''This script demonstrates how to build the resnet50 architecture
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File should be renamed to resnet_50

@fchollet
Copy link
Collaborator

General issues with the PR:

  • it's for TensorFlow, yet uses the default dim ordering (dim_ordering='th'). This is inefficient since it results in dimension shuffling back and forth with every layer.
  • the syntax should be made PEP8 compliant.
  • the docstring should be rewritten.

Specifically, for the docstring:

  • fix typos
  • mention of "go to my Github should be removed". Author note is fine.
  • if the weights are not convertible to Theano, then mention about conversion should be removed.
  • not sure why two different download links are necessary. Would Google drive not be accessible in China?

@fchollet
Copy link
Collaborator

Also it would be best to understand why weights are not convertible. Every operation is unit-tested to yield the same result in both Theano and TensorFlow (see backend tests), modulo the weight conversion operation. It should be impossible for a combination of identical operations to yield different results. Most likely an issue with your conversion code.

@MoyanZitto
Copy link
Contributor Author

@fchollet Got it, I'll fix the problems you mentioned soon. BTW Google/Facebook/Twitter and a lot of other websites are not accessible in China because they were blocked by the hateful "Great Firewall". I know it is crazy but it did happen.

@fchollet
Copy link
Collaborator

@MoyanZitto cool, thank you. Ideally we'd have a way to host model files that isn't Google drive or Baidu. Maybe AWS S3.

@MoyanZitto
Copy link
Contributor Author

@fchollet Fix some repos, not for sure whether there are still grammar mistakes in the scripts (really sorry for my limited English). If it not too much trouble, you may modify this script as you like.

I noticed that conv layers get the defualt dim_ordering by K.image_dim_ordering(), so I simply use K.set_image_dim_ordering('tf') to change the dim order, is it works?

Although we can visit AWS S3 in China, the speed is very slow... so it's better to retain the Baidu drive link.

return out


def conv_block(input_tensor, nb_filter, stage, block, kernel_size=3):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to pass "stage" and "block". They are not used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather see conv_block(input_tensor, kernel_size, filters, stride=2)

@fchollet
Copy link
Collaborator

I noticed that conv layers get the defualt dim_ordering by K.image_dim_ordering(), so I simply use K.set_image_dim_ordering('tf') to change the dim order, is it works?

That's not enough. Image dim ordering is hard-coded in several places of your code, such as when you do merges or when you load input data.

@MoyanZitto
Copy link
Contributor Author

@fchollet Thank you very much for pointing out these mistakes! You are so kind to do so.

These (ugly) names come from Kaiming He's caffe model, see http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006
Since our .h5 weights is converted from .caffemodel file, we have to carefully set the layer names so that the weights could be loaded to the corresponding layer. But you are right, it's better if these names could be removed, I'll have a try tomorrow.

BTW, I don't see anything about dim_ordering in 'merge'. Could you make it more clear?

@MoyanZitto
Copy link
Contributor Author

@fchollet
Sorry for not updating this script in past few days, I was just busy with looking for a job.
I'm afraid I have to retain these layer names because only when names of model layers and names of corresponding weights get matched, the weights could be set correctly. I've made the code more clear by setthing a "basename", it looks better now.

And, users can set dim_ordering now. I offer both 'tf' dim_ordering weights for acceleration and 'th' dim_odering weights for compatibility (if they want to use this script and their own "th" dim_ordering code jointly). The links are given at the top of the code.

I think "dim_ordering" is just how the input image been organized, it should be nothing to do with the shape of weights of conv layers. Perhaps we should cut off the dependency between input dim_ordering and the shape of conv layers. In this case a single version of weights could be loaded in a model no matter what the image dim_ordering is.

Hope this script get merged soon~~~it feels really good to be a keras contributor!

@fchollet
Copy link
Collaborator

fchollet commented Aug 3, 2016

I think "dim_ordering" is just how the input image been organized, it should be nothing to do with the shape of weights of conv layers. Perhaps we should cut off the dependency between input dim_ordering and the shape of conv layers. In this case a single version of weights could be loaded in a model no matter what the image dim_ordering is.

Not quite true. Kernels have to be transposed. Also the output of the Flatten() layer will be different based on dim ordering and thus the first Dense layer after Flatten should be reshuffled.

The reason why your code appears to run properly is actually that you are setting the dim ordering via K.set_image_dim_ordering, which does not reset the default dim ordering of conv layers (and other layers). I intend to fix this, by the way. What it means is that your conv layers are still using using th dim ordering.

So it appears to me that your support of tf dim ordering isn't correct. For the sake of merging your PR quickly, let's give it up. Please only support th dim ordering (i.e. what you were doing initially). I'll add tf support later on myself, which will involve converting the weights and isn't quite easy.

Otherwise, the code does looks better now, congrats.

@fchollet
Copy link
Collaborator

fchollet commented Aug 3, 2016

Never mind my previous post, it seems I misread your code. Let me check it out again.

@fchollet
Copy link
Collaborator

fchollet commented Aug 3, 2016

Ok, LGTM. Thanks for the valuable contribution!

@fchollet fchollet merged commit c725f8d into keras-team:master Aug 3, 2016
@MoyanZitto MoyanZitto deleted the add_example branch August 4, 2016 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants