Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point exception in solver.cpp #5976

Closed
nitinsinghgit opened this issue Oct 13, 2017 · 5 comments
Closed

Floating point exception in solver.cpp #5976

nitinsinghgit opened this issue Oct 13, 2017 · 5 comments

Comments

@nitinsinghgit
Copy link

Floating point exception in caffe.
I1012 21:53:53.901605 11666 net.cpp:761] Ignoring source layer argmax
I1012 21:53:53.902276 11666 solver.cpp:279] Solving Squeezenet_4
I1012 21:53:53.902292 11666 solver.cpp:280] Learning Rate Policy: step
I1012 21:53:54.131595 11666 solver.cpp:228] Iteration 0, loss = 0.0893308
I1012 21:53:54.166373 11666 solver.cpp:244] Train net output #0: label = 0
I1012 21:53:54.166404 11666 solver.cpp:244] Train net output #1: label = 0
I1012 21:53:54.166419 11666 solver.cpp:244] Train net output #2: label = 0
I1012 21:53:54.166430 11666 solver.cpp:244] Train net output #3: label = 0
I1012 21:53:54.166443 11666 solver.cpp:244] Train net output #4: label = 0
I1012 21:53:54.166523 11666 solver.cpp:244] Train net output #5: label_result = 0
I1012 21:53:54.166545 11666 solver.cpp:244] Train net output #6: label_result = 0
I1012 21:53:54.166559 11666 solver.cpp:244] Train net output #7: label_result = 0
I1012 21:53:54.166574 11666 solver.cpp:244] Train net output #8: label_result = 0
I1012 21:53:54.166586 11666 solver.cpp:244] Train net output #9: label_result = 0
I1012 21:53:54.166604 11666 solver.cpp:244] Train net output #10: loss = 0.0893308 (* 1 = 0.0893308 loss)
Floating point exception

Steps to reproduce

The issue appears with the train network and is reproducible, even if we consider only the image data layer in the prototxt.
The issue is not present if we run the model in test mode.

Your system configuration

Operating system: Ubunutu 14.04
Compiler: g++ 4.8.4
CUDA version (if applicable): 8.0
CUDNN version (if applicable):
BLAS: atlas
Python or MATLAB version (for pycaffe and matcaffe respectively): python 2.7

@shaibagon
Copy link
Member

please attach the train.prototxt and solver.prototxt as well as input data you are using.
Is it possible you training data is faulty?

@nitinsinghgit
Copy link
Author

train.prototxt

name: "Squeezenet_4"
layer {
name: "train_data"
type: "ImageData"
top: "data"
top: "label"
include{
phase:TRAIN}
image_data_param {
source: "data_path"
batch_size: 5
scale: 0.0039215684
new_height: 224
new_width : 224
}
transform_param {
mean_value: 104
mean_value: 117
mean_value: 123
}

}
layer{
name: "result_data"
type: "ImageData"
top: "result"
top: "label_result"
include{
phase:TRAIN}
image_data_param{
source: "label_path"
batch_size: 5
is_color:false
new_height: 224
new_width : 224
}
}
layer{
name:"silence"
bottom:"data"
bottom:"result"
type:"Silence"
}

solver.prototxt
net: "train.prototxt"
base_lr: 0.1
max_iter: 20000
lr_policy: "step"
gamma: 0.1
display: 20
weight_decay: 1.0001
solver_mode: GPU
random_seed: 831486
stepvalue: 8000
stepvalue: 13000
snapshot_prefix: "snapshot_path"
snapshot:5000
type: "AdaGrad"

The training data is fine, because the network works on the phase:TEST. It is the same issue as https://groups.google.com/forum/#!topic/caffe-users/9dmLlIeihnU

@Noiredd
Copy link
Member

Noiredd commented Oct 20, 2017

Does the problem persist when you change your dataset to some other file?
Does the problem persist when you change the solver type, let's say to standard SGD?

@Noiredd Noiredd changed the title Flaoting point exception Floating point exception in solver.cpp Oct 20, 2017
@nitinsinghgit
Copy link
Author

Yes, the problem persist when the dataset is changed and also for type:"SGD". However if i read the same dataset in TEST mode it works.

@nitinsinghgit
Copy link
Author

The issue is with the low storage space on disk.
It is not straightforward from the error message "Floating point exception" that the issue might be with low storage space on disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants