-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to install caffe-future. #1
Comments
@aalok1969, the compilation error you're getting is from a conflict in the vision_layers header. Specifically the class definitions for the I recently ran into the same problem as you and tried to resolve the conflict with PR #2 to @shelhamer's crop-layer branch. He hasn't responded yet. The PR is basically a merge of a long list of changes from the BVLC:master to bring shelhamer:crop-layer up to speed on changes there, in addition to 2 commits for resolving the SPPLayer-CropLayer class definition conflict (8ebd41b and fa0cbb2). No logical or functional changes. If you plan on checking out my PR, can you comment on whether you were able to reproduce the FCN experiments? Thanks. |
Hi @kashefy, thanks a lot for your reply. I tried your version of caffe by running
And then I compiled caffe, everything went smoothly.
I am a bit new to caffe and Github, hence I didnt understand the earlier part of your reply. Can you elaborate a bit as to what steps I should take to be able to install caffe-future ? |
@aalok1993, thanks for taking the time to checkout my changes. I'm still new to caffe myself and still trying to figure out how things are done. I was able to resolve some of the issue but still haven't figured out an end-to-end process for making things work. I have yet to train one of these FCNs successfully myself... On how to upscale the output, I don't think you need to worry about floating point stride values. The FCN models perform something similar through the Deconvolution layer. It involves bilinear interpolation but I'm a bit lost on the details. Might be worth looking up related posts in the caffe-users group. The implementation already exists in caffe, just a matter of figuring out usage. Re-building caffe-future: My understanding is that the instructions future.sh are sufficient. The merge conflict that was causing your build error was because you were merging the PRs to the BVLC:master and not longjon:master, which are not in synch at the moment. Did I get that right? I'll try to respond with something more useful when I've figure out more. |
I'm facing similar issues. Will update you guys if I find a solution myself. My next avenue if to check out other implementations of FCN using caffe. This is what I came up with: |
I was able to train the FCN32s model through fine tuning successfully. My problem was in that weights of some of the layers of my fully conv. VGG-16 variant were not being copied correctly. Please find more details under this this topic in caffe-users group. All zero weights in these layers will only propagate zeros during training. Don't you think the deconv. later will upscale your features? If a stride of 1/2 is critical for your algorithm, maybe you can use the deconv. layer to upscale by a factor of 2 using nearest neighbor interpolation (not sure about the details for this) a stride of a consecutive conv layer of 1 then be equivalent of a 1/2 stride with only a single conv layer. Would this work for you? @neurohn, yes please keep us updated. If it's more about concepts and less about the implementation it may be better to continue that discussion in the caffe-users group. |
Hi @kashefy , thanks a lot for your reply. I was able to upscale using deconvolution layer with bilinear weight filler. But I am still facing lots of issues. Initially I was getting a lots of Nan's and Inf's in my weight parameters. I tried to modify the learning rates and this problem went away. (I wanted to ask that: what various parameters should I try to modify to solve this issue) After that the issue I am facing is that, when I take an image and pass it forward, most of the blob values are coming out to be zeros and the final image I am getting is an image filled with zeros. And the weight parameters learned by the network become very huge. Below I have described the various outputs in detail. I am working on a regression problem where my input is a 256X256X3 image and the output is also a 256X256X3 image. In order to figure out the issue, I took a very small architecture(a toy example) which consists of a single convolutional layer, Relu layer, a pooling layer followed by a deconvolution layer. Also, to make it simple initially I am taking the (output label = input data), so currently my network works like an autoencoder. All it has to do is learn an approximation of the identity function. But it fails to do even that. Following are the prototxt files: deploy.prototxt, train_val.prototxt and solver.prototxt. I trained the network for 1000 iteration and used the snapshot as my model. Following is the code and output, which describes what I obtain after 1000 iterations. (NOTE: I have done the training in GPU mode as well as CPU mode. But I get the same result in each case.) Initializing caffe and Loading the network
The blobs
The parameters
The conv layer weights
The deconv layer weights
The data blob
The conv1_1 blob
The pool1 blob
The upsample2 blob
Some queriesAs it can be seen above, the outputs of conv1_1 , pool1 and upsample2 are all filled with zeros. It seem like the net is learning to output a blank image irrespective of the input. Also the weights learned by the FCN consists of many large values. I am unable to understand what is causing this issues. Should I change some parameters to solve this problem ? How should I solve the problem of large weight. Should I include a large weight decay. |
Dear @kashefy Could you please advise how to make this PR work? |
Dear @kashefy |
Dear @aalok1969 , could you please explain how you got @kashefy 's caffe working using git clone https://github.com/kashefy/caffe? I cloned it and compiled. But, then the deconvolution layer was not found by caffe while running the fine-tuning. I didn't run the future.sh script after cloning @kashefy 's caffe, as trying so caused PR conflicts. Thanks. |
Dear @atique81 , I had just performed : git clone https://github.com/kashefy/caffe This worked perfectly fine for me. (NOTE : I didn't run future.sh) |
Dear @aalok1969 Did you also use the crop layer as mentioned here https://gist.github.com/shelhamer/80667189b218ad570e82#file-train_val-prototxt-L559 ? If so, then I wonder how you could run the fcn fine-tuning from @kashefy 's caffe? |
I wont be able to answer that as I am working on a regression problem and not on segmentation. I didnt require the crop layer for my task. |
Thanks a lot @aalok1969 . Waiting for @kashefy to reply... |
Dear @kashefy , I would highly appreciate if you kindly reply. |
@atique81, did you do 'git checkout with_crop' after cloning my fork. Or
|
Hi @kashefy , I was trying to reproduce the fcn-8s-pascal-deploy.txt. I followed into this thread and checkout the with_crop branch of your fork, while caffe still does not recognize the CROP layer. Part of my error message says:
I'm wondering if you still use CROP layer, or you get away with other options. Thanks! Edit: I made it work and I will come back later with more details. |
Dear @kashefy , I highly appreciate your feedback. I have just done the following - git clone https://github.com/kashefy/caffe But, while running the fcn-32s-alexnet prototxt, it generated an error saying something like "reshape not set", which I managed to overcome by following this guideline (BVLC#2834). Now its running fine, but the loss seems to be jumping much, though only 2000 iterations have been passed (I am training on 1112 images from pascal voc2011). I will let you know the update once more iterations are finished. Thanks again for your wonderful support. |
@atique81, glad to hear you're making progress. I didn't run into the
Dear @kashefy https://github.com/kashefy , I highly appreciate your feedback. I have just done the following - git clone https://github.com/kashefy/caffe But, while running the fcn-32s-alexnet prototxt, it generated an error Now its running fine, but the loss seems to be jumping much, though only I will let you know the update once more iterations are finished. Thanks again for your wonderful support. — |
Dear @kashefy , Its now 6000 iterations going on, and the loss is jumping between 0.15 and 0.8. I hope it would become more stable once more iterations are passed. Thanks for all your cordial help. |
Hi @kashefy and @atique81 , are you training with single image every iteration or in mini-batch? If you are training in mini-batches, since the aspect ratios are different across different images, did you guys write some code for data preparation? Now I pad all images to be 500x500 to make sure all images are the same size before processing them in batches, but I am wondering if there is any built-in functions in Caffe for this. I'm kind of new to Caffe and I'm still learning the basics. Thanks! |
Hi @Eric-Phu , |
Hi @atique81 , thanks for your reply! Yeah right now I'm sticking with the explicit resizing strategy as the imagenet tutorial does. While it's weird that I cannot use a batch size like 20 according to the original FCN paper. I posted the memory issue in the Google group. Check it out if you'd like. BTW, what kind of speed do you get when training the FCN-32s? |
Dear @Eric-Phu , I am running FCN-32s on an Nvidia GeForce GTX 980 GPU with 4GB memory. Its taking approximately 1.04 Sec for one complete forward+reverse pass. Just curious to know about how you did the resize for your ground truth images, as unlike the training images, simple interpolation method won't work for ground truth labels, (as it would create new class numbers). Could you please elaborate on this? |
Hi @atique81 , thanks for your reply! Your speed is pretty nice. Mine is 5 sec per image on Tesla K40c, which makes me wonder if I did something wrong. Did you set the ‘group’ for deconvolution? I saw people’s posting about it. But whenever I set the group to be 60 by adding
You mentioned this above, I wonder if it's caused by adding Actually, you do not need to resize images. Since the images in Pascal VOC has the longer side to be 500. So all you have to do is pad the other side with mean RGB values, which at least is what I did. For ground truth labels, you may want to pad with zero, which represents the background. |
Yes, I did. So far I know (from this source: http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1BilinearFiller.html), the deconvolution layer weights dimension should be : Cx1xKxK, where, C is num_output as well as group value. Just check with that link. I am not sure if padding parts of images with mean rgb values and the corresponding parts in labels with background (class 0) will comply or not. I was in need of resizing the pascal images and labels once and someone advised me to resize the labels based on a voting principle, instead of interpolation, which I was not sure of. That's why I went for single-image batch training. |
Thanks for pointing out the link. I checked it out. It's just weird that even I used the exact same protobuf snippet and replace About the padding vs interpolation, I'm actually not sure which is the right way to go, since the original paper did not talk about it neither. |
Hi @atique81 , I found the reason why I cannot make the Another gotcha from Caffe. Anyway, problem solved. Thanks! |
Hi @Eric-Phu , nice to know that you solved it. |
Hi @kashefy , now that I have been able to train fcn 32 stride model on pascal voc2011, I am trying to test the net on pascal voc2011 validation data. But, unfortunately, caffe exits showing insufficient memory. Please note that I trained the model keeping batch size 1 and during testing batch size is also fixed to 1. Could you please advise what is going wrong? Thanks |
Hi @kashefy I am facing a problem while training. |
The |
@shelhamer Thanks for updating this! Question: Does future.sh still need to be ran? If so it is still causing issues with merging of the vision layer. |
No, just checkout |
@aalok1993 when you midified the model did you set a weight_filler (the default is to set the wieghts to zero - a stupid default if i know one) check out gausian, xavier... |
@atique81 @kashefy I really think setting batch size to 1 is a big mistake. remember, no image contains examples of all the classes, to the gradient will be skewed. If memory is the problem (and it is) you can use iter_size I'm using iter_size = 20 and batch_size = 1 |
quick question, does the loss layer treat "background" different or is it just another class? |
@aivision2020 actually we've found batch_size == 1 to be effective when paired with high momentum. See the PASCAL Context FCN in the model zoo: https://gist.github.com/shelhamer/80667189b218ad570e82#file-readme-md Eventually the arxiv will be updated with more comments on this. |
@aalok1993 |
Hi, I sent you the 3 files on your mail |
ok,I got it .thank you very much @aalok1993 |
Hi, I am wondering which step I did wrong that I cannot run eval.py from FCN-32s. (https://gist.github.com/shelhamer/80667189b218ad570e82#file-readme-md) I did git clone https://github.com/longjon/caffe/tree/future |
@bruceko Hi, I have met the same problem with you. I also checked the I wonder I have mis-installed the future release. I only download and unzip the Well, have you got your problem fixed? Do you have any suggestions? Thanks! |
@Jianchao-ICT I haven't figured out how to solve the problem yet. |
@bruceko Thanks! Well, I have noticed that |
@bruceko In fact, I wonder whether I have got |
@Jianchao-ICT I only tried the steps I posted to install caffe-future. |
@bruceko I try to run
|
@Jianchao-ICT I'm using ubuntu and I don't have such problem. |
@bruceko Well, it seems that my Linux is also Ubuntu?
|
@kashefy Hi, I have read your detailed comments above. Now I am just trying to run the |
@bruceko Hi, I have found the reason on my machine why Caffe would report |
@Jianchao-ICT Thanks for your information. I do have the same problem. |
@bruceko Yes, I change |
@longjon @shelhamer Any plan to merge to master with a PR? |
Hi all and @aalok1993 |
@shelhamer Hi, I encountered conflicts when I merged PR BVLC#2016 , it says "Automatic merge failed", should I fix conflicts manually? Thanks in advance. |
Equivalent code is already merged to master in github.com/BVLC/caffe, in case people here weren't aware. |
Hey all, Check out the fcn.berkeleyvision.org repo for master editions of the reference networks, weights, and code for learning, inference, and scoring. Closing this issue since the |
Hi,
I found out about Caffe-future from the paper Fully Convolutional Neural Networks (found the link from model-zoo). I am trying to work on a regression problem where the input to the CNN is an 256 X 256 image and the output that the CNN is supposed to produce is also an 256 X 256 image. So a version of caffe that supports Fully convolutional neural netwroks would be extremely useful for me. In the original version of caffe I was getting an error when I tried setting the stride of the convolutional layer to a float value (for upsampling). I believe the caffe-future version supports float value for stride.
However, while trying to install caffe-future I am facing some issues. I am not sure if I am missing anything. Following is what I tried for installation:
First I cloned the git repository. After that I followed the instructions mentioned in future.sh
Mentioned below are the commands I wrote and the outputs I got. The main issue I faced was in the command : hub merge BVLC#1977 which gave the error : fatal: Couldn't find remote ref refs/heads/accum-grad
Already on 'master'
Your branch is up-to-date with 'origin/master'.
error: branch 'future' not found.
Switched to a new branch 'future'
include/caffe/util/benchmark.hpp | 27 +-
include/caffe/util/coords.hpp | 61 +
include/caffe/util/cudnn.hpp | 128 +
include/caffe/util/db.hpp | 190 ++
include/caffe/util/device_alternate.hpp | 102 +
include/caffe/util/im2col.hpp | 22 +-
...
...
...
matlab/caffe/matcaffe_init.m | 11 +-
.../bvlc_alexnet/deploy.prototxt | 248 +-
models/bvlc_alexnet/readme.md | 25 +
.../bvlc_alexnet/solver.prototxt | 6 +-
.../bvlc_alexnet/train_val.prototxt | 296 ++-
models/bvlc_googlenet/deploy.prototxt | 2156 +++++++++++++++++
...
...
...
tools/test_net.cpp | 54 +-
tools/train_net.cpp | 34 +-
tools/upgrade_net_proto_binary.cpp | 17 +-
tools/upgrade_net_proto_text.cpp | 29 +-
430 files changed, 46179 insertions(+), 11932 deletions(-)
create mode 100644 .Doxyfile
create mode 100644 .travis.yml
create mode 100644 CMakeLists.txt
create mode 100644 cmake/ConfigGen.cmake
create mode 100644 cmake/Cuda.cmake
create mode 100644 cmake/Dependencies.cmake
...
...
...
create mode 100644 src/caffe/util/db.cpp
create mode 100644 src/gtest/CMakeLists.txt
create mode 100644 tools/CMakeLists.txt
create mode 100644 tools/caffe.cpp
delete mode 100644 tools/dump_network.cpp
create mode 100755 tools/extra/parse_log.py
fatal: Couldn't find remote ref refs/heads/accum-grad
From git://github.com/longjon/caffe
[new branch] python-net-spec -> longjon/python-net-spec
Auto-merging src/caffe/net.cpp
Removing src/caffe/layers/flatten_layer.cu
Auto-merging matlab/hdf5creation/demo.m
Removing matlab/caffe/read_cell.m
Removing matlab/caffe/print_cell.m
Removing matlab/caffe/prepare_batch.m
Removing matlab/caffe/matcaffe_init.m
Removing matlab/caffe/matcaffe_demo_vgg_mean_pix.m
Removing matlab/caffe/matcaffe_demo_vgg.m
Removing matlab/caffe/matcaffe_demo.m
Removing matlab/caffe/matcaffe_batch.m
Removing matlab/caffe/matcaffe.cpp
Removing matlab/caffe/ilsvrc_2012_mean.mat
Auto-merging include/caffe/vision_layers.hpp
CONFLICT (content): Merge conflict in include/caffe/vision_layers.hpp
Auto-merging include/caffe/neuron_layers.hpp
Auto-merging include/caffe/layer.hpp
Auto-merging include/caffe/common_layers.hpp
Auto-merging examples/net_surgery/bvlc_caffenet_full_conv.prototxt
Automatic merge failed; fix conflicts and then commit the result.
I am unable to compile caffe. Can someone please help me with this issue ?
The text was updated successfully, but these errors were encountered: