diff --git a/README.md b/README.md index cb531d9..19c8dc9 100644 --- a/README.md +++ b/README.md @@ -16,15 +16,15 @@ The CuDNN benchmarks are done using Torch bindings. One can also do the same via | Library | Class | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:-----------------------------------------------------------------------------------------------------------:| ----------:| ------------:| -------------:| -| **Nervana-fp16** | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md) | **92** | **29** | **62** | -| CuDNN[R3]-fp16 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 96 | 30 | 66 | -| CuDNN[R3]-fp32 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 96 | 32 | 64 | -| Nervana-fp32 | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md) | 101 | 32 | 69 | -| fbfft (Torch) | [fbnn.SpatialConvolution](https://github.com/facebook/fbcunn/tree/master/src/fft) | 104 | 31 | 72 | -| Chainer | [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py) | 177 | 40 | 136 | +| CuDNN[R4]-fp16 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | **71** | **25** | **46** | +| CuDNN[R4]-fp32 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 81 | 27 | 53 | +| **Nervana-fp16** | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md) | 92 | 29 | 62 | +| Nervana-fp32 | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md) | 101 | 32 | 69 | +| fbfft (Torch) | [fbnn.SpatialConvolution](https://github.com/facebook/fbcunn/tree/master/src/fft) | 104 | 31 | 72 | +| TensorFlow | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py) | 151 | 34 | 117 | +| Chainer | [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py) | 177 | 40 | 136 | | cudaconvnet2* | [ConvLayer](https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/filter_acts.cu) | 177 | 42 | 135 | -| CuDNN[R2] * | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 231 | 70 | 161 | -| TensorFlow | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py) | 277 | 70 | 207 | +| CuDNN[R2] * | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 231 | 70 | 161 | | Caffe (native) | [ConvolutionLayer](https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu) | 324 | 121 | 203 | | Torch-7 (native) | [SpatialConvolutionMM](https://github.com/torch/cunn/blob/master/SpatialConvolutionMM.cu) | 342 | 132 | 210 | | CL-nn (Torch) | [SpatialConvolutionMM](https://github.com/hughperkins/clnn/blob/master/SpatialConvolutionMM.cl) | 963 | 388 | 574 | @@ -34,16 +34,16 @@ The CuDNN benchmarks are done using Torch bindings. One can also do the same via | Library | Class | Time (ms) | forward (ms) | backward (ms) | |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| -| **CuDNN[R3]-fp16** (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | **313** | **107** | **206** | -| CuDNN[R3]-fp32 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 326 | 113 | 213 | -| fbfft (Torch) | [SpatialConvolutionCuFFT](https://github.com/facebook/fbcunn/tree/master/src/fft) | 342 | 114 | 227 | +| **CuDNN[R4]-fp16** (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | **242** | **86** | **156** | +| CuDNN[R4]-fp32 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 268 | 94 | 174 | +| fbfft (Torch) | [SpatialConvolutionCuFFT](https://github.com/facebook/fbcunn/tree/master/src/fft) | 342 | 114 | 227 | +| TensorFlow | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py) | 349 | 101 | 248 | | Nervana-fp16 | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md) | 355 | 112 | 242 | -| Nervana-fp32 | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md) | 398 | 124 | 273 | -| Chainer | [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py) | 620 | 135 | 484 | -| cudaconvnet2* | [ConvLayer](https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/filter_acts.cu) | 723 | 176 | 547 | -| CuDNN[R2] * | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 810 | 234 | 576 | +| Nervana-fp32 | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md) | 398 | 124 | 273 | +| Chainer | [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py) | 620 | 135 | 484 | +| cudaconvnet2* | [ConvLayer](https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/filter_acts.cu) | 723 | 176 | 547 | +| CuDNN[R2] * | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 810 | 234 | 576 | | Caffe | [ConvolutionLayer](https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu) | 823 | 355 | 468 | -| TensorFlow | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py) | 842 | 216 | 626 | | Torch-7 (native) | [SpatialConvolutionMM](https://github.com/torch/cunn/blob/master/SpatialConvolutionMM.cu) | 878 | 379 | 499 | | CL-nn (Torch) | [SpatialConvolutionMM](https://github.com/hughperkins/clnn/blob/master/SpatialConvolutionMM.cl) | 963 | 388 | 574 | | Caffe-CLGreenTea | [ConvolutionLayer](https://github.com/naibaf7/caffe) | 2857 | 616 | 2240 | @@ -54,15 +54,15 @@ The CuDNN benchmarks are done using Torch bindings. One can also do the same via |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | **Nervana-fp16** | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md) | **529** | **167** | **362** | | Nervana-fp32 | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md) | 590 | 180 | 410 | -| CuDNN[R3]-fp16 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 615 | 179 | 436 | -| CuDNN[R3]-fp32 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 615 | 196 | 418 | -| Chainer | [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py) | 885 | 251 | 632 | +| CuDNN[R4]-fp16 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 471 | 140 | 331 | +| CuDNN[R4]-fp32 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 529 | 162 | 366 | +| Chainer | [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py) | 885 | 251 | 632 | +| TensorFlow | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py) | 982 | 191 | 791 | | fbfft (Torch) | [SpatialConvolutionCuFFT](https://github.com/facebook/fbcunn/tree/master/src/fft) | 1092 | 355 | 737 | | cudaconvnet2* | [ConvLayer](https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/filter_acts.cu) | 1229 | 408 | 821 | | CuDNN[R2] * | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 1099 | 342 | 757 | | Caffe | [ConvolutionLayer](https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu) | 1068 | 323 | 745 | | Torch-7 (native) | [SpatialConvolutionMM](https://github.com/torch/cunn/blob/master/SpatialConvolutionMM.cu) | 1105 | 350 | 755 | -| TensorFlow | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py) | 1510 | 315 | 1195 | | CL-nn (Torch) | [SpatialConvolutionMM](https://github.com/hughperkins/clnn/blob/master/SpatialConvolutionMM.cl) | 3437 | 875 | 2562 | | Caffe-CLGreenTea | [ConvolutionLayer](https://github.com/naibaf7/caffe) | 5620 | 988 | 4632 | @@ -73,10 +73,10 @@ The CuDNN benchmarks are done using Torch bindings. One can also do the same via |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:| | **Nervana-fp16** | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md) | **283** | **85** | **197** | | Nervana-fp32 | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md) | 322 | 90 | 232 | -| CuDNN[R3]-fp32 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 431 | 117 | 313 | -| CuDNN[R3]-fp16 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 501 | 109 | 392 | +| CuDNN[R4]-fp16 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 462 | 112 | 349 | +| CuDNN[R4]-fp32 (Torch) | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua) | 470 | 130 | 340 | | Chainer | [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py) | 687 | 189 | 497 | -| TensorFlow | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py) | 1084 | 246 | 838 | +| TensorFlow | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py) | 905 | 187 | 718 | | Caffe | [ConvolutionLayer](https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu) | 1935 | 786 | 1148 | | CL-nn (Torch) | [SpatialConvolutionMM](https://github.com/hughperkins/clnn/blob/master/SpatialConvolutionMM.cl) | 7016 | 3027 | 3988 | | Caffe-CLGreenTea | [ConvolutionLayer](https://github.com/naibaf7/caffe) | 9462 | 746 | 8716 | diff --git a/tensorflow/output_alexnet.log b/tensorflow/output_alexnet.log index 7d8e1f3..bbcdb23 100644 --- a/tensorflow/output_alexnet.log +++ b/tensorflow/output_alexnet.log @@ -1,8 +1,8 @@ -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate (GHz) 1.076 @@ -11,62 +11,49 @@ Total memory: 12.00GiB Free memory: 11.87GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y -I tensorflow/core/common_runtime/gpu/gpu_device.cc:680] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0) -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 11.27GiB bytes. -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:52] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67 -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00GiB -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 285.19MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 190.12MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 285.19MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 190.12MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 6.2KiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 285.19MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 190.12MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 285.19MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 190.12MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 6.2KiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:624] Deallocating stream with pending work -2016-01-25 04:59:05.992691: step 10, duration = 0.069 -2016-01-25 04:59:06.653383: step 20, duration = 0.071 -2016-01-25 04:59:07.363218: step 30, duration = 0.070 -2016-01-25 04:59:08.098247: step 40, duration = 0.095 -2016-01-25 04:59:08.787165: step 50, duration = 0.070 -2016-01-25 04:59:09.501239: step 60, duration = 0.071 -2016-01-25 04:59:10.213252: step 70, duration = 0.071 -2016-01-25 04:59:10.926621: step 80, duration = 0.071 -2016-01-25 04:59:11.638691: step 90, duration = 0.072 -2016-01-25 04:59:12.282361: Forward across 100 steps, 0.070 +/- 0.010 sec / batch -2016-01-25 04:59:18.491643: step 10, duration = 0.276 -2016-01-25 04:59:21.286711: step 20, duration = 0.280 -2016-01-25 04:59:24.070488: step 30, duration = 0.275 -2016-01-25 04:59:26.882084: step 40, duration = 0.282 -2016-01-25 04:59:29.683638: step 50, duration = 0.282 -2016-01-25 04:59:32.469597: step 60, duration = 0.278 -2016-01-25 04:59:35.280004: step 70, duration = 0.283 -2016-01-25 04:59:38.092115: step 80, duration = 0.278 -2016-01-25 04:59:40.890455: step 90, duration = 0.283 -2016-01-25 04:59:43.426946: Forward-backward across 100 steps, 0.277 +/- 0.028 sec / batch +I tensorflow/core/common_runtime/gpu/gpu_device.cc:718] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0) +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256B +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512B +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:107] Allocating 11.27GiB bytes. +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:118] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67 +I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1778 get requests, put_count=1323 evicted_count=1000 eviction_rate=0.755858 and unsatisfied allocation rate=0.874578 +I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110 +2016-02-28 18:34:03.103654: step 10, duration = 0.034 +2016-02-28 18:34:03.447840: step 20, duration = 0.034 +2016-02-28 18:34:03.791359: step 30, duration = 0.034 +2016-02-28 18:34:04.131884: step 40, duration = 0.034 +2016-02-28 18:34:04.475419: step 50, duration = 0.034 +2016-02-28 18:34:04.818749: step 60, duration = 0.034 +2016-02-28 18:34:05.160298: step 70, duration = 0.034 +2016-02-28 18:34:05.501820: step 80, duration = 0.034 +2016-02-28 18:34:05.844729: step 90, duration = 0.034 +2016-02-28 18:34:06.157673: Forward across 100 steps, 0.034 +/- 0.003 sec / batch +2016-02-28 18:34:09.438178: step 10, duration = 0.151 +2016-02-28 18:34:10.955183: step 20, duration = 0.151 +2016-02-28 18:34:12.473529: step 30, duration = 0.151 +2016-02-28 18:34:14.009753: step 40, duration = 0.151 +2016-02-28 18:34:15.513419: step 50, duration = 0.146 +2016-02-28 18:34:17.031282: step 60, duration = 0.152 +2016-02-28 18:34:18.554324: step 70, duration = 0.152 +2016-02-28 18:34:20.066692: step 80, duration = 0.146 +2016-02-28 18:34:21.592371: step 90, duration = 0.154 +2016-02-28 18:34:22.970480: Forward-backward across 100 steps, 0.150 +/- 0.015 sec / batch diff --git a/tensorflow/output_googlenet.log b/tensorflow/output_googlenet.log index 0b9a73d..a101693 100644 --- a/tensorflow/output_googlenet.log +++ b/tensorflow/output_googlenet.log @@ -1,8 +1,8 @@ -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate (GHz) 1.076 @@ -11,56 +11,50 @@ Total memory: 12.00GiB Free memory: 11.87GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y -I tensorflow/core/common_runtime/gpu/gpu_device.cc:680] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0) -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 11.27GiB bytes. -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:52] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67 -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00GiB -I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 2567 get requests, put_count=2042 evicted_count=1000 eviction_rate=0.489716 and unsatisfied allocation rate=0.633035 +I tensorflow/core/common_runtime/gpu/gpu_device.cc:718] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0) +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256B +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512B +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:107] Allocating 11.27GiB bytes. +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:118] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67 +I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 3247 get requests, put_count=3143 evicted_count=1000 eviction_rate=0.318167 and unsatisfied allocation rate=0.370804 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110 -I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 5411 get requests, put_count=5459 evicted_count=1000 eviction_rate=0.183184 and unsatisfied allocation rate=0.180189 -I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 256 to 281 -E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:624] Deallocating stream with pending work -2016-01-25 05:05:15.580990: step 10, duration = 0.263 -2016-01-25 05:05:18.038548: step 20, duration = 0.247 -2016-01-25 05:05:20.551790: step 30, duration = 0.263 -2016-01-25 05:05:23.006944: step 40, duration = 0.241 -2016-01-25 05:05:25.500528: step 50, duration = 0.261 -2016-01-25 05:05:27.978199: step 60, duration = 0.251 -2016-01-25 05:05:30.461532: step 70, duration = 0.235 -2016-01-25 05:05:32.941043: step 80, duration = 0.242 -2016-01-25 05:05:35.458825: step 90, duration = 0.256 -2016-01-25 05:05:37.666623: Forward across 100 steps, 0.246 +/- 0.029 sec / batch -2016-01-25 05:06:00.530894: step 10, duration = 1.095 -2016-01-25 05:06:11.485838: step 20, duration = 1.116 -2016-01-25 05:06:22.309278: step 30, duration = 0.979 -2016-01-25 05:06:33.288320: step 40, duration = 0.972 -2016-01-25 05:06:44.164229: step 50, duration = 0.980 -2016-01-25 05:06:55.231582: step 60, duration = 1.203 -2016-01-25 05:07:06.160001: step 70, duration = 1.216 -2016-01-25 05:07:17.169251: step 80, duration = 1.227 -2016-01-25 05:07:28.099824: step 90, duration = 1.203 -2016-01-25 05:07:37.977609: Forward-backward across 100 steps, 1.084 +/- 0.131 sec / batch +I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 0 get requests, put_count=2010 evicted_count=2000 eviction_rate=0.995025 and unsatisfied allocation rate=0 +2016-02-28 18:37:32.148810: step 10, duration = 0.188 +2016-02-28 18:37:34.032619: step 20, duration = 0.188 +2016-02-28 18:37:35.921020: step 30, duration = 0.188 +2016-02-28 18:37:37.813859: step 40, duration = 0.189 +2016-02-28 18:37:39.700296: step 50, duration = 0.188 +2016-02-28 18:37:41.584602: step 60, duration = 0.188 +2016-02-28 18:37:43.473643: step 70, duration = 0.188 +2016-02-28 18:37:45.370821: step 80, duration = 0.201 +2016-02-28 18:37:47.255847: step 90, duration = 0.189 +2016-02-28 18:37:48.963790: Forward across 100 steps, 0.187 +/- 0.019 sec / batch +2016-02-28 18:38:08.733669: step 10, duration = 0.925 +2016-02-28 18:38:17.842521: step 20, duration = 0.926 +2016-02-28 18:38:26.965862: step 30, duration = 0.925 +2016-02-28 18:38:36.084211: step 40, duration = 0.922 +2016-02-28 18:38:45.226841: step 50, duration = 0.927 +2016-02-28 18:38:54.355223: step 60, duration = 0.934 +2016-02-28 18:39:03.472584: step 70, duration = 0.905 +2016-02-28 18:39:12.626487: step 80, duration = 0.907 +2016-02-28 18:39:21.813921: step 90, duration = 0.895 +2016-02-28 18:39:30.085788: Forward-backward across 100 steps, 0.905 +/- 0.092 sec / batch diff --git a/tensorflow/output_overfeat.log b/tensorflow/output_overfeat.log index 2a7a4a1..9dcedc4 100644 --- a/tensorflow/output_overfeat.log +++ b/tensorflow/output_overfeat.log @@ -1,8 +1,8 @@ -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate (GHz) 1.076 @@ -11,62 +11,49 @@ Total memory: 12.00GiB Free memory: 11.87GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y -I tensorflow/core/common_runtime/gpu/gpu_device.cc:680] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0) -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 11.27GiB bytes. -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:52] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67 -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00GiB -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 648.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 675.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 324.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 324.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 648.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 675.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 162.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 324.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 648.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 675.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:624] Deallocating stream with pending work -2016-01-25 04:59:47.954455: step 10, duration = 0.216 -2016-01-25 04:59:50.125522: step 20, duration = 0.218 -2016-01-25 04:59:52.302775: step 30, duration = 0.220 -2016-01-25 04:59:54.491975: step 40, duration = 0.206 -2016-01-25 04:59:56.741014: step 50, duration = 0.220 -2016-01-25 04:59:59.031490: step 60, duration = 0.321 -2016-01-25 05:00:01.219937: step 70, duration = 0.218 -2016-01-25 05:00:03.262796: step 80, duration = 0.220 -2016-01-25 05:00:05.516300: step 90, duration = 0.221 -2016-01-25 05:00:07.601365: Forward across 100 steps, 0.216 +/- 0.045 sec / batch -2016-01-25 05:00:25.654317: step 10, duration = 0.852 -2016-01-25 05:00:34.149322: step 20, duration = 0.850 -2016-01-25 05:00:42.656984: step 30, duration = 0.844 -2016-01-25 05:00:51.156913: step 40, duration = 0.852 -2016-01-25 05:00:59.670852: step 50, duration = 0.853 -2016-01-25 05:01:08.162712: step 60, duration = 0.852 -2016-01-25 05:01:16.680141: step 70, duration = 0.855 -2016-01-25 05:01:25.189005: step 80, duration = 0.858 -2016-01-25 05:01:33.721154: step 90, duration = 0.850 -2016-01-25 05:01:41.404763: Forward-backward across 100 steps, 0.842 +/- 0.085 sec / batch +I tensorflow/core/common_runtime/gpu/gpu_device.cc:718] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0) +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256B +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512B +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:107] Allocating 11.27GiB bytes. +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:118] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67 +I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1779 get requests, put_count=1323 evicted_count=1000 eviction_rate=0.755858 and unsatisfied allocation rate=0.874649 +I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110 +2016-02-28 18:34:26.396518: step 10, duration = 0.103 +2016-02-28 18:34:27.413668: step 20, duration = 0.102 +2016-02-28 18:34:28.441604: step 30, duration = 0.103 +2016-02-28 18:34:29.470384: step 40, duration = 0.101 +2016-02-28 18:34:30.484839: step 50, duration = 0.102 +2016-02-28 18:34:31.509178: step 60, duration = 0.102 +2016-02-28 18:34:32.537476: step 70, duration = 0.104 +2016-02-28 18:34:33.573302: step 80, duration = 0.103 +2016-02-28 18:34:34.597368: step 90, duration = 0.103 +2016-02-28 18:34:35.519101: Forward across 100 steps, 0.101 +/- 0.010 sec / batch +2016-02-28 18:34:43.001105: step 10, duration = 0.354 +2016-02-28 18:34:46.511634: step 20, duration = 0.355 +2016-02-28 18:34:50.048593: step 30, duration = 0.356 +2016-02-28 18:34:53.560119: step 40, duration = 0.353 +2016-02-28 18:34:57.091679: step 50, duration = 0.360 +2016-02-28 18:35:00.619250: step 60, duration = 0.354 +2016-02-28 18:35:04.147982: step 70, duration = 0.355 +2016-02-28 18:35:07.678886: step 80, duration = 0.357 +2016-02-28 18:35:11.211387: step 90, duration = 0.359 +2016-02-28 18:35:14.393544: Forward-backward across 100 steps, 0.349 +/- 0.035 sec / batch diff --git a/tensorflow/output_vgga.log b/tensorflow/output_vgga.log index 7f89baf..6442b73 100644 --- a/tensorflow/output_vgga.log +++ b/tensorflow/output_vgga.log @@ -1,8 +1,8 @@ -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally -I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally +I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate (GHz) 1.076 @@ -11,62 +11,49 @@ Total memory: 12.00GiB Free memory: 11.87GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y -I tensorflow/core/common_runtime/gpu/gpu_device.cc:680] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0) -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 11.27GiB bytes. -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:52] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67 -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.0KiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.00MiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00GiB -I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00GiB -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 882.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 882.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 220.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 220.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 2.2KiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 1.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 882.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 220.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 1.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 441.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. -E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:624] Deallocating stream with pending work -2016-01-25 05:01:49.963409: step 10, duration = 0.353 -2016-01-25 05:01:53.541880: step 20, duration = 0.352 -2016-01-25 05:01:57.079947: step 30, duration = 0.354 -2016-01-25 05:02:00.598771: step 40, duration = 0.374 -2016-01-25 05:02:04.116766: step 50, duration = 0.355 -2016-01-25 05:02:07.661613: step 60, duration = 0.356 -2016-01-25 05:02:11.205755: step 70, duration = 0.354 -2016-01-25 05:02:14.772028: step 80, duration = 0.354 -2016-01-25 05:02:18.337085: step 90, duration = 0.352 -2016-01-25 05:02:21.513479: Forward across 100 steps, 0.351 +/- 0.044 sec / batch -2016-01-25 05:02:53.384239: step 10, duration = 1.511 -2016-01-25 05:03:08.630315: step 20, duration = 1.512 -2016-01-25 05:03:23.868209: step 30, duration = 1.530 -2016-01-25 05:03:39.143444: step 40, duration = 1.534 -2016-01-25 05:03:54.422246: step 50, duration = 1.530 -2016-01-25 05:04:09.696207: step 60, duration = 1.511 -2016-01-25 05:04:24.942348: step 70, duration = 1.529 -2016-01-25 05:04:40.157606: step 80, duration = 1.536 -2016-01-25 05:04:55.395796: step 90, duration = 1.530 -2016-01-25 05:05:09.152744: Forward-backward across 100 steps, 1.510 +/- 0.159 sec / batch +I tensorflow/core/common_runtime/gpu/gpu_device.cc:718] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0) +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256B +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512B +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512.0KiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.00MiB +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:107] Allocating 11.27GiB bytes. +I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:118] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67 +I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1999 get requests, put_count=1323 evicted_count=1000 eviction_rate=0.755858 and unsatisfied allocation rate=0.888444 +I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110 +2016-02-28 18:35:19.786187: step 10, duration = 0.192 +2016-02-28 18:35:21.721211: step 20, duration = 0.191 +2016-02-28 18:35:23.657017: step 30, duration = 0.198 +2016-02-28 18:35:25.593985: step 40, duration = 0.193 +2016-02-28 18:35:27.527923: step 50, duration = 0.194 +2016-02-28 18:35:29.457957: step 60, duration = 0.194 +2016-02-28 18:35:31.390243: step 70, duration = 0.193 +2016-02-28 18:35:33.324563: step 80, duration = 0.194 +2016-02-28 18:35:35.256393: step 90, duration = 0.194 +2016-02-28 18:35:36.995732: Forward across 100 steps, 0.191 +/- 0.019 sec / batch +2016-02-28 18:35:57.968941: step 10, duration = 0.982 +2016-02-28 18:36:07.865760: step 20, duration = 0.976 +2016-02-28 18:36:17.768448: step 30, duration = 0.977 +2016-02-28 18:36:27.673898: step 40, duration = 0.976 +2016-02-28 18:36:37.590995: step 50, duration = 0.980 +2016-02-28 18:36:47.517216: step 60, duration = 0.988 +2016-02-28 18:36:57.443127: step 70, duration = 0.981 +2016-02-28 18:37:07.363943: step 80, duration = 0.987 +2016-02-28 18:37:17.264746: step 90, duration = 0.976 +2016-02-28 18:37:26.214138: Forward-backward across 100 steps, 0.982 +/- 0.099 sec / batch diff --git a/torch7/imagenet_winners/output.log b/torch7/imagenet_winners/output.log index 089dd41..683bb2e 100644 --- a/torch7/imagenet_winners/output.log +++ b/torch7/imagenet_winners/output.log @@ -1,34 +1,30 @@ -Running on device: GeForce GTX TITAN X -ModelType: AlexNet Kernels: cudnn Input shape: 128x3x224x224 -cudnn :updateOutput(): 32.35 -cudnn :updateGradInput(): 30.63 -cudnn :accGradParameters(): 33.62 -cudnn :Forward: 32.35 -cudnn :Backward: 64.25 -cudnn :TOTAL: 96.60 - -ModelType: VGG Model-A Kernels: cudnn Input shape: 64x3x224x224 -cudnn :updateOutput(): 196.52 -cudnn :updateGradInput(): 212.04 -cudnn :accGradParameters(): 206.72 -cudnn :Forward: 196.52 -cudnn :Backward: 418.76 -cudnn :TOTAL: 615.27 - -ModelType: OverFeat[fast] Kernels: cudnn Input shape: 128x3x231x231 -cudnn :updateOutput(): 113.28 -cudnn :updateGradInput(): 101.09 -cudnn :accGradParameters(): 112.32 -cudnn :Forward: 113.28 -cudnn :Backward: 213.42 -cudnn :TOTAL: 326.70 - -ModelType: GoogleNet Kernels: cudnn Input shape: 128x3x224x224 -cudnn :updateOutput(): 117.82 -cudnn :updateGradInput(): 213.58 -cudnn :accGradParameters(): 100.16 -cudnn :Forward: 117.82 -cudnn :Backward: 313.74 -cudnn :TOTAL: 431.56 - - +Running on device: GeForce GTX TITAN X +ModelType: AlexNet Kernels: cudnn Input shape: 128x3x224x224 +cudnn :updateOutput(): 27.65 +cudnn :updateGradInput(): 24.32 +cudnn :accGradParameters(): 28.99 +cudnn :Forward: 27.65 +cudnn :Backward: 53.31 +cudnn :TOTAL: 80.96 +ModelType: OverFeat[fast] Kernels: cudnn Input shape: 128x3x231x231 +cudnn :updateOutput(): 94.28 +cudnn :updateGradInput(): 81.17 +cudnn :accGradParameters(): 93.07 +cudnn :Forward: 94.28 +cudnn :Backward: 174.24 +cudnn :TOTAL: 268.52 +ModelType: VGG Model-A Kernels: cudnn Input shape: 64x3x224x224 +cudnn :updateOutput(): 162.74 +cudnn :updateGradInput(): 167.05 +cudnn :accGradParameters(): 199.49 +cudnn :Forward: 162.74 +cudnn :Backward: 366.54 +cudnn :TOTAL: 529.29 +ModelType: GoogleNet Kernels: cudnn Input shape: 128x3x224x224 +cudnn :updateOutput(): 130.76 +cudnn :updateGradInput(): 197.86 +cudnn :accGradParameters(): 142.15 +cudnn :Forward: 130.76 +cudnn :Backward: 340.01 +cudnn :TOTAL: 470.77 + diff --git a/torch7/imagenet_winners/output_cudnn_fp16.log b/torch7/imagenet_winners/output_cudnn_fp16.log index 7f9d95b..525227e 100644 --- a/torch7/imagenet_winners/output_cudnn_fp16.log +++ b/torch7/imagenet_winners/output_cudnn_fp16.log @@ -1,34 +1,30 @@ -Running on device: GeForce GTX TITAN X -ModelType: AlexNet Kernels: cudnn Input shape: 128x3x224x224 -cudnn :updateOutput(): 30.08 -cudnn :updateGradInput(): 26.93 -cudnn :accGradParameters(): 39.70 -cudnn :Forward: 30.08 -cudnn :Backward: 66.63 -cudnn :TOTAL: 96.71 - -ModelType: VGG Model-A Kernels: cudnn Input shape: 64x3x224x224 -cudnn :updateOutput(): 179.19 -cudnn :updateGradInput(): 185.43 -cudnn :accGradParameters(): 251.16 -cudnn :Forward: 179.19 -cudnn :Backward: 436.59 -cudnn :TOTAL: 615.78 - -ModelType: OverFeat[fast] Kernels: cudnn Input shape: 128x3x231x231 -cudnn :updateOutput(): 107.09 -cudnn :updateGradInput(): 93.60 -cudnn :accGradParameters(): 112.81 -cudnn :Forward: 107.09 -cudnn :Backward: 206.42 -cudnn :TOTAL: 313.51 - -ModelType: GoogleNet Kernels: cudnn Input shape: 128x3x224x224 -cudnn :updateOutput(): 109.21 -cudnn :updateGradInput(): 231.47 -cudnn :accGradParameters(): 161.08 -cudnn :Forward: 109.21 -cudnn :Backward: 392.55 -cudnn :TOTAL: 501.76 - - +Running on device: GeForce GTX TITAN X +ModelType: AlexNet Kernels: cudnn Input shape: 128x3x224x224 +cudnn :updateOutput(): 24.87 +cudnn :updateGradInput(): 21.15 +cudnn :accGradParameters(): 25.64 +cudnn :Forward: 24.87 +cudnn :Backward: 46.79 +cudnn :TOTAL: 71.66 +ModelType: OverFeat[fast] Kernels: cudnn Input shape: 128x3x231x231 +cudnn :updateOutput(): 86.15 +cudnn :updateGradInput(): 73.20 +cudnn :accGradParameters(): 83.29 +cudnn :Forward: 86.15 +cudnn :Backward: 156.50 +cudnn :TOTAL: 242.64 +ModelType: VGG Model-A Kernels: cudnn Input shape: 64x3x224x224 +cudnn :updateOutput(): 140.33 +cudnn :updateGradInput(): 144.58 +cudnn :accGradParameters(): 186.94 +cudnn :Forward: 140.33 +cudnn :Backward: 331.52 +cudnn :TOTAL: 471.85 +ModelType: GoogleNet Kernels: cudnn Input shape: 128x3x224x224 +cudnn :updateOutput(): 112.51 +cudnn :updateGradInput(): 223.20 +cudnn :accGradParameters(): 126.51 +cudnn :Forward: 112.51 +cudnn :Backward: 349.71 +cudnn :TOTAL: 462.22 +