diff --git a/README.md b/README.md
index cb531d9..19c8dc9 100644
--- a/README.md
+++ b/README.md
@@ -16,15 +16,15 @@ The CuDNN benchmarks are done using Torch bindings. One can also do the same via
 
 | Library         | Class                                                                                                                | Time (ms)  | forward (ms) | backward (ms) |
 |:------------------------:|:-----------------------------------------------------------------------------------------------------------:| ----------:| ------------:| -------------:|
-| **Nervana-fp16**    | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md)                    |   **92**   |  **29**      |    **62**     |
-| CuDNN[R3]-fp16 (Torch)     | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)       |      96    |  30          |   66          |
-| CuDNN[R3]-fp32 (Torch)      | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)       |      96    |  32          |   64          |
-| Nervana-fp32        | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md)                    |      101   |  32          |    69         |
-| fbfft   (Torch)                  | [fbnn.SpatialConvolution](https://github.com/facebook/fbcunn/tree/master/src/fft)                           |      104   |  31          |    72         |
-| Chainer                 |  [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py)    |    177 | 40 | 136 |
+| CuDNN[R4]-fp16 (Torch)     | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)     |  **71**    |  **25**      |   **46**      |
+| CuDNN[R4]-fp32 (Torch)      | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)    |      81    |  27          |   53          |
+| **Nervana-fp16**    | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md)                         |      92    |  29          |    62         |
+| Nervana-fp32        | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md)                         |      101   |  32          |    69         |
+| fbfft   (Torch)                  | [fbnn.SpatialConvolution](https://github.com/facebook/fbcunn/tree/master/src/fft)                   |      104   |  31          |    72         |
+| TensorFlow               | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py)                  |      151   |  34          |   117         |
+| Chainer                 |  [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py)    |      177   |  40          |   136         |
 | cudaconvnet2*            | [ConvLayer](https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/filter_acts.cu)        |      177   |  42          |   135         |
-| CuDNN[R2] *             | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)       |      231   |  70          |   161         |
-| TensorFlow               | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py)                  |      277   |  70          |   207         |
+| CuDNN[R2] *             | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)        |      231   |  70          |   161         |
 | Caffe (native)           | [ConvolutionLayer](https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)                |      324   | 121          |   203         |
 | Torch-7 (native)         | [SpatialConvolutionMM](https://github.com/torch/cunn/blob/master/SpatialConvolutionMM.cu)                   |      342   | 132          |   210         |
 | CL-nn (Torch)            | [SpatialConvolutionMM](https://github.com/hughperkins/clnn/blob/master/SpatialConvolutionMM.cl)             |      963   | 388          |   574         |
@@ -34,16 +34,16 @@ The CuDNN benchmarks are done using Torch bindings. One can also do the same via
 
 | Library                  | Class                                                                                                                    | Time (ms)         | forward (ms)            | backward (ms)            |
 |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:|
-| **CuDNN[R3]-fp16**  (Torch)      | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |         **313**       |  **107**                    |  **206**             |
-| CuDNN[R3]-fp32  (Torch)      | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |         326       |  113                    |   213                    |
-| fbfft  (Torch)                   | [SpatialConvolutionCuFFT](https://github.com/facebook/fbcunn/tree/master/src/fft)                                        |         342       |  114                    |   227                    |
+| **CuDNN[R4]-fp16**  (Torch)      | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)         |         **242**       |  **86**                    |  **156**             |
+| CuDNN[R4]-fp32  (Torch)      | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)             |         268       |  94                    |   174                    |
+| fbfft  (Torch)                   | [SpatialConvolutionCuFFT](https://github.com/facebook/fbcunn/tree/master/src/fft)                             |         342       |  114                    |   227                    |
+| TensorFlow               | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py)                            |         349       |  101                    |   248                    |
 | Nervana-fp16          | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md)                                 |         355       |  112                    |   242                    |
-| Nervana-fp32            | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md)                                 |         398       |  124                    |   273                    |
-| Chainer                 |  [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py)    |    620 | 135 | 484 |
-| cudaconvnet2*            | [ConvLayer](https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/filter_acts.cu)                     |         723       |  176                    |   547                    |
-| CuDNN[R2] *             | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |         810       |  234                    |   576                    |
+| Nervana-fp32            | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md)                               |         398       |  124                    |   273                    |
+| Chainer                 |  [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py)              |         620       |  135                    |   484                    |
+| cudaconvnet2*            | [ConvLayer](https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/filter_acts.cu)                  |         723       |  176                    |   547                    |
+| CuDNN[R2] *             | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                  |         810       |  234                    |   576                    |
 | Caffe                    | [ConvolutionLayer](https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)                             |         823       |  355                    |   468                    |
-| TensorFlow               | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py)                               |      842         |  216                    |   626                    |
 | Torch-7 (native)         | [SpatialConvolutionMM](https://github.com/torch/cunn/blob/master/SpatialConvolutionMM.cu)                                |         878       |  379                    |   499                    |
 | CL-nn (Torch)            | [SpatialConvolutionMM](https://github.com/hughperkins/clnn/blob/master/SpatialConvolutionMM.cl)                          |         963       |  388                    |   574                    |
 | Caffe-CLGreenTea         | [ConvolutionLayer](https://github.com/naibaf7/caffe)                                                                     |      2857   | 616          |   2240         |
@@ -54,15 +54,15 @@ The CuDNN benchmarks are done using Torch bindings. One can also do the same via
 |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:|
 | **Nervana-fp16**    | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md)                                 |    **529**        |  **167**                |   **362**                |
 | Nervana-fp32        | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md)                                 |        590        |  180                    |   410                    |
-| CuDNN[R3]-fp16  (Torch)     | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |       615         |  179                    |   436                    |
-| CuDNN[R3]-fp32  (Torch)     | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |       615         |  196                    |   418                    |
-| Chainer                 |  [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py)    |    885 | 251 | 632 |
+| CuDNN[R4]-fp16  (Torch)     | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |       471         |  140                    |   331                    |
+| CuDNN[R4]-fp32  (Torch)     | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |       529         |  162                    |   366                    |
+| Chainer                 |  [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py)                   |    885 | 251 | 632 |
+| TensorFlow               | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py)                               |      982         |  191                    |   791                    |
 | fbfft    (Torch)                 | [SpatialConvolutionCuFFT](https://github.com/facebook/fbcunn/tree/master/src/fft)                                        |       1092        |  355                    |   737                    |
 | cudaconvnet2*            | [ConvLayer](https://github.com/soumith/cuda-convnet2.torch/blob/master/cudaconv3/src/filter_acts.cu)                     |       1229        |  408                    |   821                    |
 | CuDNN[R2] *             | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |       1099        |  342                    |   757                    |
 | Caffe                    | [ConvolutionLayer](https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)                             |       1068        |  323                    |   745                    |
 | Torch-7 (native)         | [SpatialConvolutionMM](https://github.com/torch/cunn/blob/master/SpatialConvolutionMM.cu)                                |       1105        |  350                    |   755                    |
-| TensorFlow               | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py)                               |      1510         |  315                    |   1195                    |
 | CL-nn (Torch)            | [SpatialConvolutionMM](https://github.com/hughperkins/clnn/blob/master/SpatialConvolutionMM.cl)                          |       3437        |  875                    |   2562                   |
 | Caffe-CLGreenTea         | [ConvolutionLayer](https://github.com/naibaf7/caffe)             |      5620   | 988          |   4632         |
 
@@ -73,10 +73,10 @@ The CuDNN benchmarks are done using Torch bindings. One can also do the same via
 |:------------------------:|:------------------------------------------------------------------------------------------------------------------------:| -----------------:| -----------------------:| ------------------------:|
 | **Nervana-fp16**    | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md)                                 |    **283**        |  **85**                 |   **197**                |
 | Nervana-fp32        | [ConvLayer](https://github.com/soumith/convnet-benchmarks/blob/master/nervana/README.md)                                 |        322        |  90                     |   232                    |
-| CuDNN[R3]-fp32  (Torch)      | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |       431         |  117                    |   313                    |
-| CuDNN[R3]-fp16   (Torch)     | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |       501         |  109                    |   392                    |
+| CuDNN[R4]-fp16   (Torch)     | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |       462         |  112                    |   349                    |
+| CuDNN[R4]-fp32  (Torch)      | [cudnn.SpatialConvolution](https://github.com/soumith/cudnn.torch/blob/master/SpatialConvolution.lua)                    |       470         |  130                    |   340                    |
 | Chainer                 |  [Convolution2D](https://github.com/pfnet/chainer/blob/master/chainer/links/connection/convolution_2d.py)              |    687            |               189      |   497                       |
-| TensorFlow               | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py)                               |      1084         |  246                    |   838                    |
+| TensorFlow               | [conv2d](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py)                               |      905         |  187                    |   718                    |
 | Caffe                    | [ConvolutionLayer](https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu)                             |       1935        |  786                    |   1148                   |
 | CL-nn (Torch)            | [SpatialConvolutionMM](https://github.com/hughperkins/clnn/blob/master/SpatialConvolutionMM.cl)                          |       7016        |  3027                   |   3988                   |
 | Caffe-CLGreenTea         | [ConvolutionLayer](https://github.com/naibaf7/caffe)                                                                     |      9462   | 746          |   8716         |
diff --git a/tensorflow/output_alexnet.log b/tensorflow/output_alexnet.log
index 7d8e1f3..bbcdb23 100644
--- a/tensorflow/output_alexnet.log
+++ b/tensorflow/output_alexnet.log
@@ -1,8 +1,8 @@
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
 name: GeForce GTX TITAN X
 major: 5 minor: 2 memoryClockRate (GHz) 1.076
@@ -11,62 +11,49 @@ Total memory: 12.00GiB
 Free memory: 11.87GiB
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
-I tensorflow/core/common_runtime/gpu/gpu_device.cc:680] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0)
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 11.27GiB bytes.
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:52] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00GiB
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 285.19MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 190.12MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 285.19MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 190.12MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 6.2KiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 285.19MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 190.12MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 285.19MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 190.12MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 6.2KiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:624] Deallocating stream with pending work
-2016-01-25 04:59:05.992691: step 10, duration = 0.069
-2016-01-25 04:59:06.653383: step 20, duration = 0.071
-2016-01-25 04:59:07.363218: step 30, duration = 0.070
-2016-01-25 04:59:08.098247: step 40, duration = 0.095
-2016-01-25 04:59:08.787165: step 50, duration = 0.070
-2016-01-25 04:59:09.501239: step 60, duration = 0.071
-2016-01-25 04:59:10.213252: step 70, duration = 0.071
-2016-01-25 04:59:10.926621: step 80, duration = 0.071
-2016-01-25 04:59:11.638691: step 90, duration = 0.072
-2016-01-25 04:59:12.282361: Forward across 100 steps, 0.070 +/- 0.010 sec / batch
-2016-01-25 04:59:18.491643: step 10, duration = 0.276
-2016-01-25 04:59:21.286711: step 20, duration = 0.280
-2016-01-25 04:59:24.070488: step 30, duration = 0.275
-2016-01-25 04:59:26.882084: step 40, duration = 0.282
-2016-01-25 04:59:29.683638: step 50, duration = 0.282
-2016-01-25 04:59:32.469597: step 60, duration = 0.278
-2016-01-25 04:59:35.280004: step 70, duration = 0.283
-2016-01-25 04:59:38.092115: step 80, duration = 0.278
-2016-01-25 04:59:40.890455: step 90, duration = 0.283
-2016-01-25 04:59:43.426946: Forward-backward across 100 steps, 0.277 +/- 0.028 sec / batch
+I tensorflow/core/common_runtime/gpu/gpu_device.cc:718] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0)
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256B
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512B
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:107] Allocating 11.27GiB bytes.
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:118] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67
+I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1778 get requests, put_count=1323 evicted_count=1000 eviction_rate=0.755858 and unsatisfied allocation rate=0.874578
+I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
+2016-02-28 18:34:03.103654: step 10, duration = 0.034
+2016-02-28 18:34:03.447840: step 20, duration = 0.034
+2016-02-28 18:34:03.791359: step 30, duration = 0.034
+2016-02-28 18:34:04.131884: step 40, duration = 0.034
+2016-02-28 18:34:04.475419: step 50, duration = 0.034
+2016-02-28 18:34:04.818749: step 60, duration = 0.034
+2016-02-28 18:34:05.160298: step 70, duration = 0.034
+2016-02-28 18:34:05.501820: step 80, duration = 0.034
+2016-02-28 18:34:05.844729: step 90, duration = 0.034
+2016-02-28 18:34:06.157673: Forward across 100 steps, 0.034 +/- 0.003 sec / batch
+2016-02-28 18:34:09.438178: step 10, duration = 0.151
+2016-02-28 18:34:10.955183: step 20, duration = 0.151
+2016-02-28 18:34:12.473529: step 30, duration = 0.151
+2016-02-28 18:34:14.009753: step 40, duration = 0.151
+2016-02-28 18:34:15.513419: step 50, duration = 0.146
+2016-02-28 18:34:17.031282: step 60, duration = 0.152
+2016-02-28 18:34:18.554324: step 70, duration = 0.152
+2016-02-28 18:34:20.066692: step 80, duration = 0.146
+2016-02-28 18:34:21.592371: step 90, duration = 0.154
+2016-02-28 18:34:22.970480: Forward-backward across 100 steps, 0.150 +/- 0.015 sec / batch
diff --git a/tensorflow/output_googlenet.log b/tensorflow/output_googlenet.log
index 0b9a73d..a101693 100644
--- a/tensorflow/output_googlenet.log
+++ b/tensorflow/output_googlenet.log
@@ -1,8 +1,8 @@
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
 name: GeForce GTX TITAN X
 major: 5 minor: 2 memoryClockRate (GHz) 1.076
@@ -11,56 +11,50 @@ Total memory: 12.00GiB
 Free memory: 11.87GiB
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
-I tensorflow/core/common_runtime/gpu/gpu_device.cc:680] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0)
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 11.27GiB bytes.
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:52] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00GiB
-I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 2567 get requests, put_count=2042 evicted_count=1000 eviction_rate=0.489716 and unsatisfied allocation rate=0.633035
+I tensorflow/core/common_runtime/gpu/gpu_device.cc:718] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0)
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256B
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512B
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:107] Allocating 11.27GiB bytes.
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:118] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67
+I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 3247 get requests, put_count=3143 evicted_count=1000 eviction_rate=0.318167 and unsatisfied allocation rate=0.370804
 I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
-I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 5411 get requests, put_count=5459 evicted_count=1000 eviction_rate=0.183184 and unsatisfied allocation rate=0.180189
-I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 256 to 281
-E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:624] Deallocating stream with pending work
-2016-01-25 05:05:15.580990: step 10, duration = 0.263
-2016-01-25 05:05:18.038548: step 20, duration = 0.247
-2016-01-25 05:05:20.551790: step 30, duration = 0.263
-2016-01-25 05:05:23.006944: step 40, duration = 0.241
-2016-01-25 05:05:25.500528: step 50, duration = 0.261
-2016-01-25 05:05:27.978199: step 60, duration = 0.251
-2016-01-25 05:05:30.461532: step 70, duration = 0.235
-2016-01-25 05:05:32.941043: step 80, duration = 0.242
-2016-01-25 05:05:35.458825: step 90, duration = 0.256
-2016-01-25 05:05:37.666623: Forward across 100 steps, 0.246 +/- 0.029 sec / batch
-2016-01-25 05:06:00.530894: step 10, duration = 1.095
-2016-01-25 05:06:11.485838: step 20, duration = 1.116
-2016-01-25 05:06:22.309278: step 30, duration = 0.979
-2016-01-25 05:06:33.288320: step 40, duration = 0.972
-2016-01-25 05:06:44.164229: step 50, duration = 0.980
-2016-01-25 05:06:55.231582: step 60, duration = 1.203
-2016-01-25 05:07:06.160001: step 70, duration = 1.216
-2016-01-25 05:07:17.169251: step 80, duration = 1.227
-2016-01-25 05:07:28.099824: step 90, duration = 1.203
-2016-01-25 05:07:37.977609: Forward-backward across 100 steps, 1.084 +/- 0.131 sec / batch
+I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 0 get requests, put_count=2010 evicted_count=2000 eviction_rate=0.995025 and unsatisfied allocation rate=0
+2016-02-28 18:37:32.148810: step 10, duration = 0.188
+2016-02-28 18:37:34.032619: step 20, duration = 0.188
+2016-02-28 18:37:35.921020: step 30, duration = 0.188
+2016-02-28 18:37:37.813859: step 40, duration = 0.189
+2016-02-28 18:37:39.700296: step 50, duration = 0.188
+2016-02-28 18:37:41.584602: step 60, duration = 0.188
+2016-02-28 18:37:43.473643: step 70, duration = 0.188
+2016-02-28 18:37:45.370821: step 80, duration = 0.201
+2016-02-28 18:37:47.255847: step 90, duration = 0.189
+2016-02-28 18:37:48.963790: Forward across 100 steps, 0.187 +/- 0.019 sec / batch
+2016-02-28 18:38:08.733669: step 10, duration = 0.925
+2016-02-28 18:38:17.842521: step 20, duration = 0.926
+2016-02-28 18:38:26.965862: step 30, duration = 0.925
+2016-02-28 18:38:36.084211: step 40, duration = 0.922
+2016-02-28 18:38:45.226841: step 50, duration = 0.927
+2016-02-28 18:38:54.355223: step 60, duration = 0.934
+2016-02-28 18:39:03.472584: step 70, duration = 0.905
+2016-02-28 18:39:12.626487: step 80, duration = 0.907
+2016-02-28 18:39:21.813921: step 90, duration = 0.895
+2016-02-28 18:39:30.085788: Forward-backward across 100 steps, 0.905 +/- 0.092 sec / batch
diff --git a/tensorflow/output_overfeat.log b/tensorflow/output_overfeat.log
index 2a7a4a1..9dcedc4 100644
--- a/tensorflow/output_overfeat.log
+++ b/tensorflow/output_overfeat.log
@@ -1,8 +1,8 @@
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
 name: GeForce GTX TITAN X
 major: 5 minor: 2 memoryClockRate (GHz) 1.076
@@ -11,62 +11,49 @@ Total memory: 12.00GiB
 Free memory: 11.87GiB
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
-I tensorflow/core/common_runtime/gpu/gpu_device.cc:680] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0)
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 11.27GiB bytes.
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:52] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00GiB
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 648.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 675.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 324.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 324.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 648.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 675.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 162.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 324.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 648.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 675.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:624] Deallocating stream with pending work
-2016-01-25 04:59:47.954455: step 10, duration = 0.216
-2016-01-25 04:59:50.125522: step 20, duration = 0.218
-2016-01-25 04:59:52.302775: step 30, duration = 0.220
-2016-01-25 04:59:54.491975: step 40, duration = 0.206
-2016-01-25 04:59:56.741014: step 50, duration = 0.220
-2016-01-25 04:59:59.031490: step 60, duration = 0.321
-2016-01-25 05:00:01.219937: step 70, duration = 0.218
-2016-01-25 05:00:03.262796: step 80, duration = 0.220
-2016-01-25 05:00:05.516300: step 90, duration = 0.221
-2016-01-25 05:00:07.601365: Forward across 100 steps, 0.216 +/- 0.045 sec / batch
-2016-01-25 05:00:25.654317: step 10, duration = 0.852
-2016-01-25 05:00:34.149322: step 20, duration = 0.850
-2016-01-25 05:00:42.656984: step 30, duration = 0.844
-2016-01-25 05:00:51.156913: step 40, duration = 0.852
-2016-01-25 05:00:59.670852: step 50, duration = 0.853
-2016-01-25 05:01:08.162712: step 60, duration = 0.852
-2016-01-25 05:01:16.680141: step 70, duration = 0.855
-2016-01-25 05:01:25.189005: step 80, duration = 0.858
-2016-01-25 05:01:33.721154: step 90, duration = 0.850
-2016-01-25 05:01:41.404763: Forward-backward across 100 steps, 0.842 +/- 0.085 sec / batch
+I tensorflow/core/common_runtime/gpu/gpu_device.cc:718] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0)
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256B
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512B
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:107] Allocating 11.27GiB bytes.
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:118] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67
+I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1779 get requests, put_count=1323 evicted_count=1000 eviction_rate=0.755858 and unsatisfied allocation rate=0.874649
+I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
+2016-02-28 18:34:26.396518: step 10, duration = 0.103
+2016-02-28 18:34:27.413668: step 20, duration = 0.102
+2016-02-28 18:34:28.441604: step 30, duration = 0.103
+2016-02-28 18:34:29.470384: step 40, duration = 0.101
+2016-02-28 18:34:30.484839: step 50, duration = 0.102
+2016-02-28 18:34:31.509178: step 60, duration = 0.102
+2016-02-28 18:34:32.537476: step 70, duration = 0.104
+2016-02-28 18:34:33.573302: step 80, duration = 0.103
+2016-02-28 18:34:34.597368: step 90, duration = 0.103
+2016-02-28 18:34:35.519101: Forward across 100 steps, 0.101 +/- 0.010 sec / batch
+2016-02-28 18:34:43.001105: step 10, duration = 0.354
+2016-02-28 18:34:46.511634: step 20, duration = 0.355
+2016-02-28 18:34:50.048593: step 30, duration = 0.356
+2016-02-28 18:34:53.560119: step 40, duration = 0.353
+2016-02-28 18:34:57.091679: step 50, duration = 0.360
+2016-02-28 18:35:00.619250: step 60, duration = 0.354
+2016-02-28 18:35:04.147982: step 70, duration = 0.355
+2016-02-28 18:35:07.678886: step 80, duration = 0.357
+2016-02-28 18:35:11.211387: step 90, duration = 0.359
+2016-02-28 18:35:14.393544: Forward-backward across 100 steps, 0.349 +/- 0.035 sec / batch
diff --git a/tensorflow/output_vgga.log b/tensorflow/output_vgga.log
index 7f89baf..6442b73 100644
--- a/tensorflow/output_vgga.log
+++ b/tensorflow/output_vgga.log
@@ -1,8 +1,8 @@
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
-I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
+I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
 name: GeForce GTX TITAN X
 major: 5 minor: 2 memoryClockRate (GHz) 1.076
@@ -11,62 +11,49 @@ Total memory: 12.00GiB
 Free memory: 11.87GiB
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
-I tensorflow/core/common_runtime/gpu/gpu_device.cc:680] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0)
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 11.27GiB bytes.
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:52] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.0KiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.00MiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00GiB
-I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00GiB
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 882.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 882.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 220.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 220.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 2.2KiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 1.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 882.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 220.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 1.72GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:120] Ran out of memory trying to allocate 441.00MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.
-E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:624] Deallocating stream with pending work
-2016-01-25 05:01:49.963409: step 10, duration = 0.353
-2016-01-25 05:01:53.541880: step 20, duration = 0.352
-2016-01-25 05:01:57.079947: step 30, duration = 0.354
-2016-01-25 05:02:00.598771: step 40, duration = 0.374
-2016-01-25 05:02:04.116766: step 50, duration = 0.355
-2016-01-25 05:02:07.661613: step 60, duration = 0.356
-2016-01-25 05:02:11.205755: step 70, duration = 0.354
-2016-01-25 05:02:14.772028: step 80, duration = 0.354
-2016-01-25 05:02:18.337085: step 90, duration = 0.352
-2016-01-25 05:02:21.513479: Forward across 100 steps, 0.351 +/- 0.044 sec / batch
-2016-01-25 05:02:53.384239: step 10, duration = 1.511
-2016-01-25 05:03:08.630315: step 20, duration = 1.512
-2016-01-25 05:03:23.868209: step 30, duration = 1.530
-2016-01-25 05:03:39.143444: step 40, duration = 1.534
-2016-01-25 05:03:54.422246: step 50, duration = 1.530
-2016-01-25 05:04:09.696207: step 60, duration = 1.511
-2016-01-25 05:04:24.942348: step 70, duration = 1.529
-2016-01-25 05:04:40.157606: step 80, duration = 1.536
-2016-01-25 05:04:55.395796: step 90, duration = 1.530
-2016-01-25 05:05:09.152744: Forward-backward across 100 steps, 1.510 +/- 0.159 sec / batch
+I tensorflow/core/common_runtime/gpu/gpu_device.cc:718] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:06:00.0)
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256B
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512B
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 512.0KiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 1.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 2.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 4.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 8.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 16.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 32.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 64.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 128.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:53] Creating bin of max chunk size 256.00MiB
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:107] Allocating 11.27GiB bytes.
+I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:118] GPU 0 memory begins at 0x1306c80000 extends to 0x15d8553a67
+I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1999 get requests, put_count=1323 evicted_count=1000 eviction_rate=0.755858 and unsatisfied allocation rate=0.888444
+I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
+2016-02-28 18:35:19.786187: step 10, duration = 0.192
+2016-02-28 18:35:21.721211: step 20, duration = 0.191
+2016-02-28 18:35:23.657017: step 30, duration = 0.198
+2016-02-28 18:35:25.593985: step 40, duration = 0.193
+2016-02-28 18:35:27.527923: step 50, duration = 0.194
+2016-02-28 18:35:29.457957: step 60, duration = 0.194
+2016-02-28 18:35:31.390243: step 70, duration = 0.193
+2016-02-28 18:35:33.324563: step 80, duration = 0.194
+2016-02-28 18:35:35.256393: step 90, duration = 0.194
+2016-02-28 18:35:36.995732: Forward across 100 steps, 0.191 +/- 0.019 sec / batch
+2016-02-28 18:35:57.968941: step 10, duration = 0.982
+2016-02-28 18:36:07.865760: step 20, duration = 0.976
+2016-02-28 18:36:17.768448: step 30, duration = 0.977
+2016-02-28 18:36:27.673898: step 40, duration = 0.976
+2016-02-28 18:36:37.590995: step 50, duration = 0.980
+2016-02-28 18:36:47.517216: step 60, duration = 0.988
+2016-02-28 18:36:57.443127: step 70, duration = 0.981
+2016-02-28 18:37:07.363943: step 80, duration = 0.987
+2016-02-28 18:37:17.264746: step 90, duration = 0.976
+2016-02-28 18:37:26.214138: Forward-backward across 100 steps, 0.982 +/- 0.099 sec / batch
diff --git a/torch7/imagenet_winners/output.log b/torch7/imagenet_winners/output.log
index 089dd41..683bb2e 100644
--- a/torch7/imagenet_winners/output.log
+++ b/torch7/imagenet_winners/output.log
@@ -1,34 +1,30 @@
-Running on device: GeForce GTX TITAN X
-ModelType: AlexNet	Kernels: cudnn	Input shape: 128x3x224x224
-cudnn                                   :updateOutput():      32.35
-cudnn                                :updateGradInput():      30.63
-cudnn                              :accGradParameters():      33.62
-cudnn                                          :Forward:      32.35
-cudnn                                         :Backward:      64.25
-cudnn                                            :TOTAL:      96.60
-
-ModelType: VGG Model-A	Kernels: cudnn	Input shape: 64x3x224x224
-cudnn                                   :updateOutput():     196.52
-cudnn                                :updateGradInput():     212.04
-cudnn                              :accGradParameters():     206.72
-cudnn                                          :Forward:     196.52
-cudnn                                         :Backward:     418.76
-cudnn                                            :TOTAL:     615.27
-
-ModelType: OverFeat[fast]	Kernels: cudnn	Input shape: 128x3x231x231
-cudnn                                   :updateOutput():     113.28
-cudnn                                :updateGradInput():     101.09
-cudnn                              :accGradParameters():     112.32
-cudnn                                          :Forward:     113.28
-cudnn                                         :Backward:     213.42
-cudnn                                            :TOTAL:     326.70
-
-ModelType: GoogleNet	Kernels: cudnn	Input shape: 128x3x224x224
-cudnn                                   :updateOutput():     117.82
-cudnn                                :updateGradInput():     213.58
-cudnn                              :accGradParameters():     100.16
-cudnn                                          :Forward:     117.82
-cudnn                                         :Backward:     313.74
-cudnn                                            :TOTAL:     431.56
-
-
+Running on device: GeForce GTX TITAN X	
+ModelType: AlexNet	Kernels: cudnn	Input shape: 128x3x224x224	
+cudnn                                   :updateOutput():      27.65	
+cudnn                                :updateGradInput():      24.32	
+cudnn                              :accGradParameters():      28.99	
+cudnn                                          :Forward:      27.65	
+cudnn                                         :Backward:      53.31	
+cudnn                                            :TOTAL:      80.96	
+ModelType: OverFeat[fast]	Kernels: cudnn	Input shape: 128x3x231x231	
+cudnn                                   :updateOutput():      94.28	
+cudnn                                :updateGradInput():      81.17	
+cudnn                              :accGradParameters():      93.07	
+cudnn                                          :Forward:      94.28	
+cudnn                                         :Backward:     174.24	
+cudnn                                            :TOTAL:     268.52	
+ModelType: VGG Model-A	Kernels: cudnn	Input shape: 64x3x224x224	
+cudnn                                   :updateOutput():     162.74	
+cudnn                                :updateGradInput():     167.05	
+cudnn                              :accGradParameters():     199.49	
+cudnn                                          :Forward:     162.74	
+cudnn                                         :Backward:     366.54	
+cudnn                                            :TOTAL:     529.29	
+ModelType: GoogleNet	Kernels: cudnn	Input shape: 128x3x224x224	
+cudnn                                   :updateOutput():     130.76	
+cudnn                                :updateGradInput():     197.86	
+cudnn                              :accGradParameters():     142.15	
+cudnn                                          :Forward:     130.76	
+cudnn                                         :Backward:     340.01	
+cudnn                                            :TOTAL:     470.77	
+	
diff --git a/torch7/imagenet_winners/output_cudnn_fp16.log b/torch7/imagenet_winners/output_cudnn_fp16.log
index 7f9d95b..525227e 100644
--- a/torch7/imagenet_winners/output_cudnn_fp16.log
+++ b/torch7/imagenet_winners/output_cudnn_fp16.log
@@ -1,34 +1,30 @@
-Running on device: GeForce GTX TITAN X
-ModelType: AlexNet	Kernels: cudnn	Input shape: 128x3x224x224
-cudnn                                   :updateOutput():      30.08
-cudnn                                :updateGradInput():      26.93
-cudnn                              :accGradParameters():      39.70
-cudnn                                          :Forward:      30.08
-cudnn                                         :Backward:      66.63
-cudnn                                            :TOTAL:      96.71
-
-ModelType: VGG Model-A	Kernels: cudnn	Input shape: 64x3x224x224
-cudnn                                   :updateOutput():     179.19
-cudnn                                :updateGradInput():     185.43
-cudnn                              :accGradParameters():     251.16
-cudnn                                          :Forward:     179.19
-cudnn                                         :Backward:     436.59
-cudnn                                            :TOTAL:     615.78
-
-ModelType: OverFeat[fast]	Kernels: cudnn	Input shape: 128x3x231x231
-cudnn                                   :updateOutput():     107.09
-cudnn                                :updateGradInput():      93.60
-cudnn                              :accGradParameters():     112.81
-cudnn                                          :Forward:     107.09
-cudnn                                         :Backward:     206.42
-cudnn                                            :TOTAL:     313.51
-
-ModelType: GoogleNet	Kernels: cudnn	Input shape: 128x3x224x224
-cudnn                                   :updateOutput():     109.21
-cudnn                                :updateGradInput():     231.47
-cudnn                              :accGradParameters():     161.08
-cudnn                                          :Forward:     109.21
-cudnn                                         :Backward:     392.55
-cudnn                                            :TOTAL:     501.76
-
-
+Running on device: GeForce GTX TITAN X	
+ModelType: AlexNet	Kernels: cudnn	Input shape: 128x3x224x224	
+cudnn                                   :updateOutput():      24.87	
+cudnn                                :updateGradInput():      21.15	
+cudnn                              :accGradParameters():      25.64	
+cudnn                                          :Forward:      24.87	
+cudnn                                         :Backward:      46.79	
+cudnn                                            :TOTAL:      71.66	
+ModelType: OverFeat[fast]	Kernels: cudnn	Input shape: 128x3x231x231	
+cudnn                                   :updateOutput():      86.15	
+cudnn                                :updateGradInput():      73.20	
+cudnn                              :accGradParameters():      83.29	
+cudnn                                          :Forward:      86.15	
+cudnn                                         :Backward:     156.50	
+cudnn                                            :TOTAL:     242.64	
+ModelType: VGG Model-A	Kernels: cudnn	Input shape: 64x3x224x224	
+cudnn                                   :updateOutput():     140.33	
+cudnn                                :updateGradInput():     144.58	
+cudnn                              :accGradParameters():     186.94	
+cudnn                                          :Forward:     140.33	
+cudnn                                         :Backward:     331.52	
+cudnn                                            :TOTAL:     471.85	
+ModelType: GoogleNet	Kernels: cudnn	Input shape: 128x3x224x224	
+cudnn                                   :updateOutput():     112.51	
+cudnn                                :updateGradInput():     223.20	
+cudnn                              :accGradParameters():     126.51	
+cudnn                                          :Forward:     112.51	
+cudnn                                         :Backward:     349.71	
+cudnn                                            :TOTAL:     462.22	
+