Degraded performance after updating from Cuda 8.0 to Cuda 9.0, and cuDNN v5 to cuDNN v7? #429

ProGamerGov · 2017-10-25T00:35:09Z

After using th neural_style.lua -gpu 0 -backend cudnn to compare an older install of Neural-Style, to a newer one, I noticed that the performance seems to have gotten worse. Is this something that's fixable? Or are the new versions of Cuda and cuDNN just not as efficient as the previous versions?

cuda-repo-ubuntu1604_8.0.44-1_amd64.deb, cudnn-8.0-linux-x64-v5.0-ga.tgz

Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-38-generic x86_64)

ubuntu@ip-Address:~$ nvidia-smi
Wed Oct 25 00:23:28 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   59C    P0   137W / 149W |   1365MiB / 11439MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1536    C   /home/ubuntu/torch/install/bin/luajit         1363MiB |
+-----------------------------------------------------------------------------+

Compared to:

Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-1038-aws x86_64)

cudnn-9.0-linux-x64-v7.tgz, libcudnn7_7.0.3.11-1+cuda9.0_amd64.deb

ubuntu@ip-Address:~$ nvidia-smi
Wed Oct 25 00:24:57 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   55C    P0   144W / 149W |   1755MiB / 11439MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1653      C   /home/ubuntu/torch/install/bin/luajit       1744MiB |
+-----------------------------------------------------------------------------+

Before the network loads the model, more memory is also used:

|    0      2065      C   /home/ubuntu/torch/install/bin/luajit        200MiB |

This is a pretty large change in terms of resource usage, and it's definitely not for the better.

The text was updated successfully, but these errors were encountered:

ProGamerGov · 2017-10-25T20:54:02Z

Testing Cuda 8.0 (cuda-repo-ubuntu1604_8.0.61-1_amd64.deb, CUDA Toolkit 8.0 GA2 (Feb 2017)) with th neural_style.lua -gpu 0 -backend cudnn, and cudnn-8.0-linux-x64-v5.0-ga.tgz, seems to use more memory as well. This is interesting compared to the CUDA Toolkit 8.0 GA1 (Sept 2016) version, tested above.

ubuntu@ip-Address:~/neural-style$ nvidia-smi
Wed Oct 25 20:50:25 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   50C    P0   149W / 149W |   1726MiB / 11439MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     29512      C   /home/ubuntu/torch/install/bin/luajit       1715MiB |
+-----------------------------------------------------------------------------+

This makes me wonder if CUDA Toolkit 8.0 GA2 (Feb 2017) added something new that Neural-Style doesn't require, and thus we can disable/remove it in order to lower memory use.

ProGamerGov · 2017-10-25T23:38:28Z

Another setup for comparison:

Ubuntu 14.04.4 LTS (GNU/Linux 3.13.0-79-generic x86_64)

cuda-repo-ubuntu1404_7.5-18_amd64.deb, cudnn-7.0-linux-x64-v4.0-prod.tgz

Cuda 7.5, and cuDNN v4

Setup memory usage:

ubuntu@ip-Address:~/neural-style$ nvidia-smi
Wed Oct 25 23:34:49 2017
+------------------------------------------------------+
| NVIDIA-SMI 352.79     Driver Version: 352.79         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   46C    P0    70W / 149W |    154MiB / 11519MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1564    C   /home/ubuntu/torch/install/bin/luajit           97MiB |
+-----------------------------------------------------------------------------+

Running th neural_style.lua -gpu 0 -backend cudnn:

ubuntu@ip-Address:~/neural-style$ nvidia-smi
Wed Oct 25 23:37:28 2017
+------------------------------------------------------+
| NVIDIA-SMI 352.79     Driver Version: 352.79         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   63C    P0   142W / 149W |   1375MiB / 11519MiB |     93%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1564    C   /home/ubuntu/torch/install/bin/luajit         1319MiB |
+-----------------------------------------------------------------------------+

It looks like the memory usage significantly increased in the Cuda 8.0 (Feb 2017) update, while previous versions of Cuda were around 23.5% more efficient, unless the issue scales with the image size value. If the issue scales with the image size value, that would significantly lower the largest possible image size.

ProGamerGov · 2017-12-07T23:51:20Z

An interesting side effect of changing the Cuda, Cudnn, Torch7 versions, and maybe even the Ubuntu versions), is that the seed value effects seems to change.

So if you use -seed 876 with Cuda 8.0, etc..., then that same seed value will not create the same output with Cuda 9.0, etc...

ProGamerGov · 2018-03-03T22:41:33Z

Another way to slightly lower memory usage, seems to be possible by stripping layers from a VGG model:

#428 (comment)

flaushi · 2018-03-14T16:55:57Z

have you tried instance normalization with your setups? I encounter problems here ....

ProGamerGov · 2018-03-15T04:35:27Z

@flaushi Are you refering to instance normalization from fast-neural-style?

flaushi · 2018-03-15T06:07:39Z

Yes!

ajhool · 2018-05-30T06:11:01Z

It looks like the memory usage significantly increased in the Cuda 8.0 (Feb 2017) update, while previous versions of Cuda were around 23.5% more efficient, unless the issue scales with the image size value. If the issue scales with the image size value, that would significantly lower the largest possible image size.

I'm confused @ProGamerGov , the GPU memory usage definitely scales with image output size -- were these CUDA benchmarks taken with the same image output size? One of my ec2 snapshots has a corrupted cuda install, somehow, so I'm deciding which cuda to go with -- would rather use the most memory-efficient.

ProGamerGov · 2018-05-31T07:09:43Z

@ajhool I believe that I was using th neural_style.lua -gpu 0 -backend cudnn for all the tests, according to my first post in this thread. I currently have 3-4 different EC2 AMIs with CUDA installed, so I can check again if you like. But I do remember finding that parameter sets which worked with earlier version of CUDA, running out of memory on the latest versions.

ajhool · 2018-05-31T17:02:48Z

That's really interesting. I sensed a significant speed increase on the newer versions of Cuda and cudnn, but "sensed" is the operative word because I didn't do a controlled benchmark. I also sensed the memory usage increase. The latest Cuda supports the Volta GPU, too, so I'm excited to see if there are significant rendering gains to be made, there.

What I'm actually finding (I think) is that for smaller renders the main bottleneck occurs in some process that luajit is executing on the CPU. So, when I run multiple renders on the same GPU (~60% GPU memory usage), I am seeing significant (2x-3x) rendering slowdowns, and I am also seeing all 4 of the CPU cores get locked up at 100% due to a luajit process. I believe that I'd posted an issue about this in the past but might be misremembering. But concurrent rendering seems to be constrained by the CPU before the GPU, frustratingly.

ProGamerGov changed the title ~~Worse performance after updating from Cuda 8.0 to Cuda 9.0, and cuDNN v5 to cuDNN v7?~~ Degraded performance after updating from Cuda 8.0 to Cuda 9.0, and cuDNN v5 to cuDNN v7? Oct 25, 2017

This was referenced Oct 25, 2017

Loss values stay the same for every iteration, with extremely large image sizes #428

Open

cuda 9.0 "error: more than one operator "==" matches these operands" torch/cutorch#797

Open

ProGamerGov mentioned this issue Oct 31, 2017

cudnnConvolutionBackwardData failed - Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnConvolutionBackwardData) soumith/cudnn.torch#384

Open

ProGamerGov mentioned this issue Jan 9, 2018

What version(s) of CUDA and cudNN are supported/recommended? jcjohnson/fast-neural-style#142

Open

ProGamerGov mentioned this issue Feb 8, 2018

Cuda 9 - {Question] #443

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Degraded performance after updating from Cuda 8.0 to Cuda 9.0, and cuDNN v5 to cuDNN v7? #429

Degraded performance after updating from Cuda 8.0 to Cuda 9.0, and cuDNN v5 to cuDNN v7? #429

ProGamerGov commented Oct 25, 2017 •

edited

Loading

ProGamerGov commented Oct 25, 2017 •

edited

Loading

ProGamerGov commented Oct 25, 2017 •

edited

Loading

ProGamerGov commented Dec 7, 2017

ProGamerGov commented Mar 3, 2018 •

edited

Loading

flaushi commented Mar 14, 2018

ProGamerGov commented Mar 15, 2018

flaushi commented Mar 15, 2018

ajhool commented May 30, 2018

ProGamerGov commented May 31, 2018

ajhool commented May 31, 2018

Degraded performance after updating from Cuda 8.0 to Cuda 9.0, and cuDNN v5 to cuDNN v7? #429

Degraded performance after updating from Cuda 8.0 to Cuda 9.0, and cuDNN v5 to cuDNN v7? #429

Comments

ProGamerGov commented Oct 25, 2017 • edited Loading

ProGamerGov commented Oct 25, 2017 • edited Loading

ProGamerGov commented Oct 25, 2017 • edited Loading

ProGamerGov commented Dec 7, 2017

ProGamerGov commented Mar 3, 2018 • edited Loading

flaushi commented Mar 14, 2018

ProGamerGov commented Mar 15, 2018

flaushi commented Mar 15, 2018

ajhool commented May 30, 2018

ProGamerGov commented May 31, 2018

ajhool commented May 31, 2018

ProGamerGov commented Oct 25, 2017 •

edited

Loading

ProGamerGov commented Oct 25, 2017 •

edited

Loading

ProGamerGov commented Oct 25, 2017 •

edited

Loading

ProGamerGov commented Mar 3, 2018 •

edited

Loading