Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GPU kernels #118

Closed
7 tasks done
seanpmorgan opened this issue Mar 30, 2019 · 13 comments
Closed
7 tasks done

Implement GPU kernels #118

seanpmorgan opened this issue Mar 30, 2019 · 13 comments
Labels

Comments

@seanpmorgan
Copy link
Member

seanpmorgan commented Mar 30, 2019

Currently we've omitted the gpu kernels from addons while we got everything setup. Now that CI testing is setup for GPU we should begin adding these to addons. There's a lot of things required so I wanted to start this issue so we can track / discuss.

Some of the TODO:

@seanpmorgan
Copy link
Member Author

@gunan @martinwicke Is there anyway we can see how tensorflow/community#2 is progressing?

We've been discussing how we would like to package an Addons that includes GPU kernels and I believe we'll be depending on this in order to have a single pip package that loads either gpu or non gpu kernels. Is that correct? Publishing an addons-gpu does not seem like a good idea if this is on the horizon

@gunan
Copy link

gunan commented Apr 8, 2019

@yifeif and I are working towards this
We are hoping to deliver this in Q2

@seanpmorgan
Copy link
Member Author

Small update: dynamic kernels are now implemented on the tensorflow master branch so we can begin work on this.

@Squadrick
Copy link
Member

@seanpmorgan Finally got myself an nvidia GPU. I can work on this.

@Squadrick
Copy link
Member

@gunan Hey, do you have some sample code on how this is supposed to be used?

In the RFC, you mention a new API: tf.load_kernel_library, but I can't seem to find in the master branch of TensorFlow, I did find tf.load_library though.

@gunan
Copy link

gunan commented Jun 12, 2019

During implementation, we decided to make it more general and name it "load_library" instead.
Please use that function instead.

@facaiy
Copy link
Member

facaiy commented Jun 17, 2019

@av8ramit Hi, Amit. Can you check the GPU test failures in #294? Which docker image do we use for gpu test? Is it tensorflow/tensorflow:custom-op-gpu image required by https://github.com/tensorflow/custom-op? Thank you.

@av8ramit
Copy link
Contributor

Hi @facaiy we use an internal GCP image, but it's similar to the Docker image you listed.

@seanpmorgan
Copy link
Member Author

@yifeif @gunan We're in the process of setting up our GPU env, but it looks as though tf-nightly-2.0-preview returns False when running tf.test.is_gpu_available(). If we run with tf-nightly-gpu-2.0-preview it correctly finds the device. As I understood it dynamic kernels have been merged into master so is this just a bug in device_lib or is the standard nightly package not able to run GPU kernels?

More information I'm running custom-op-gpu container and running:

from tensorflow.python.client import device_lib
device_lib.list_local_devices()

I can file an issue on tensorflow/tensorflow if it's just something that needs to be updated in that library.

@seanpmorgan
Copy link
Member Author

seanpmorgan commented Jul 29, 2019

@karmel @yifeif @gunan Friendly bump on the above question.

We're currently dealing with a failing test case that is only occuring on python2 tf-nightly-2.0-gpu. python3 gpu is passing because there hasn't been a released pip package for py34 gpu in 10 days. This makes is very difficult for us to work around these new errors when tf core nightly builds are failing frequently and we're not sure how to build our tests around them. See #373

Why is there still a tf-nightly-2.0-gpu package and is it going to stay around.

@av8ramit
Copy link
Contributor

We have dropped support for Python3.4 packages as it is fairly old. @karmel could this issue be solved with upgrading the python3 version to Python3.5?

@seanpmorgan
Copy link
Member Author

Good to know thanks. So we will upgrade our CI scripts to install py36, but for local testing the custom-op docker image we use only has py34 installed (still waiting on a new custom-op which looks like its soon)

Is there a reason for the 2.0-nightly-gpu pip package though? We were under the impression there would be a unified pip package now that dynamic kernels are in. We need to know this because we have a fail on GPU pip but not CPU pip package. It's difficult to troubleshoot without knowing the difference between these 2.

@seanpmorgan seanpmorgan added build and removed help wanted Needs help as a contribution labels Aug 11, 2019
@seanpmorgan
Copy link
Member Author

seanpmorgan commented Aug 11, 2019

Closing this as all kernels are now implemented. I'll create a new issue for discussing how we will package GPU kernels as there is some discussion needed.

For record keeping... here are the notes from SIG Build meeitng regarding the 2 tf-core packages:

* ETA for tensorflow and tensorflow-gpu package to consolidate into a single pip package with dynamic kernels?
	* Ubuntu pip package available as tf-nightly-gpu, fixing windows gpu pip package now. If not 2.0, will be in 2.1 (will update this after confirming).
	* Dynamic kernels may take some more time to land
		* Packages are already available as tf-nightly-gpu but Windows is being fixed
		* Uncertain if this will happen in 2.0, because we’re not sure about breaking it
	* Jonathan: Will dynamic kernel support MKL/DNN?
		* Gunhan: Seems like no, haven’t figured this out yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants