-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caffe - inconsistency in the activation feature values - GPU mode #2783
Comments
I believe that you can't hold onto references the way you are right now. Caffe copies to/from the GPU which makes old pointers to memory invalid after any calls to Another note: whenever grabbing results from a forward pass, make sure that you make a copy of the data with |
Hi I made the changes you suggested above, but that issue still persists. I even ran the python script from "caffe/examples/00-classification.ipynb" but in the GPU mode the output is arbitrary everytime. Following are the outputs I get "Predicted class is 49855." I mean how does it give a value of 49462, I mean there are not even that many classes. Following is the code for "caffe/examples/00-classification.ipynb"
|
Have you tried running without cuDNN? I vaguely remember seeing somewhere that it's not always deterministic, but I could be wrong. |
The last line of your code is wrong: when you call argmax, you need to give it the correct axis (axis=1). Otherwise, it is computing the argmax over a flattened version of the array, which is only meaningful if your batchsize is 1 -- but in your case the batchsize is 50. If you're processing just one image, at a time (a single cat image), you should also set the batchsize to 1. Right now you're making 50 copies of the input image and classifying all of them (since the assignment to |
I tried making the changes you suggested, but it still gives the same error. When I run the above code in CPU mode, I always get the same output everytime. But when I run it in GPU mode, I get arbitrary values everytime. The problem seems to be related with the GPU. |
You should at least be getting predicted class labels in the range [0, 1000) this time. Also, does it work on the GPU without cuDNN? |
Yes, the predicted classes are within [0,1000). I didnt quite understand what you meant by
Do you mean to say that I'll need to recompile caffe without using the cUDNN files, or is there a faster way to test that ? |
You could either recompile without cuDNN (disabling it in the Makefile), or you could insert "engine: caffe" inside the prototxt params for any layer that has a cuDNN version. For example: https://gist.github.com/longjon/ac410cad48a088710872#file-fcn-32s-pascal-deploy-prototxt |
Hi @seanbell . |
Please ask usage and system configuration questions on the mailing list. This seems to have the fault of an installation of cuDNN gone wrong. From https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:
|
Hi I am using Caffe on Ubuntu 14.04
CUDA version 7.0
cudnn version 2
GPU : NVIDIA GT 730
In caffe first I get the initialization done and then I load the imagenet model (Alexnet). I also initialize the gpu using set_mode_gpu()
After that I take an image. Lets call the image as x.
I copy this image onto the caffe source blob. Then I perform a forward pass for this image by using : net.forward(end='fc7')
Then I extract the 4096 dimensional fc7 output.(the activation features of the fc7 layer)
The problem I am facing is that when I run the same code multiple times, everytime I obtain a different result. That is, in GPU mode, everytime the activation features are different for the same image. When I am using forward pass, the function of the network is supposed to be deterministic right ? So I should get the same output everytime for the same image.
On the other hand, when I run caffe on cpu by using set_mode_cpu() everything works perfectly, i.e, I get the same output each time
The code used and the outputs obtained are shown below. I am not able to understand what the problem is. Is it that the problem is caused due to GPU rounding off ? But the errors are very large. Or is it due to some issues with the latest CUDNN version ? Or is it something else altogether ?
Following is the CODE
#1) IMPORT libraries
#2) IMPORT Caffe Models and define utility functions
#3) LOADING Image and setting constants
#4) Setting the source image and making the forward pass to obtain fc7 activation features
FOLLOWING is the output that I obtained for 'print dst.data' when I ran the above code multiple times
output on 1st execution of code
output on 2nd execution of code
output on 3rd execution of code
output on 4th execution of code
The text was updated successfully, but these errors were encountered: