Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GUI] Use Taichi kernels instead of NumPy operations to reduce GUI.set_image overhead #1132

Merged
merged 24 commits into from
Jun 6, 2020

Conversation

archibate
Copy link
Collaborator

@archibate archibate commented Jun 4, 2020

@archibate archibate added this to the v0.6.8 milestone Jun 4, 2020
@archibate archibate added GAMES201 GAMES 201 students' wishlist rendering labels Jun 4, 2020
@archibate archibate marked this pull request as ready for review June 4, 2020 06:52
@archibate archibate requested review from k-ye, taichi-gardener and yuanming-hu and removed request for k-ye June 4, 2020 06:52
@archibate archibate changed the title [GUI] less copy overhead, fix low FPS for small shaders with high resolution [GUI] less copy overhead for simple shader kernels with high resolution Jun 4, 2020
@archibate
Copy link
Collaborator Author

Before:

x64 Profiler: 21.1 fps
[  9.23%] paint                                        min   1.605 ms   avg   2.164 ms    max   6.565 ms   total   0.433 s [    200x]
[ 90.77%] set_image                                    min  20.121 ms   avg  21.288 ms    max  23.373 ms   total   4.258 s [    200x]
CUDA Profiler: 23.9 fps
[  3.10%] paint                                        min   0.882 ms   avg   0.892 ms    max   0.947 ms   total   0.178 s [    200x]
[ 96.90%] set_image                                    min  26.985 ms   avg  27.858 ms    max  29.705 ms   total   5.572 s [    200x]

After:

x64 Profiler: 50.4 fps
[ 33.59%] paint                                        min   1.590 ms   avg   2.092 ms    max   3.523 ms   total   0.418 s [    200x]
[ 66.41%] set_image                                    min   3.711 ms   avg   4.137 ms    max   5.504 ms   total   0.827 s [    200x]

CUDA Profiler: 36.1 fps
[  6.06%] paint                                        min   0.874 ms   avg   0.886 ms    max   1.040 ms   total   0.177 s [    200x]
[ 93.94%] set_image                                    min  13.323 ms   avg  13.739 ms    max  14.824 ms   total   2.748 s [    200x]

Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. I have left a few questions, since I'm not very familiar with the GUI part. Also, could you briefly explain what the optimization is in the PR description?

examples/simple_uv.py Show resolved Hide resolved
python/taichi/lang/matrix.py Outdated Show resolved Hide resolved
@@ -497,11 +497,17 @@ def assign_renamed(x, y):
from .meta import fill_matrix
fill_matrix(self, val)

def shape_ext(self, as_vector=None):
if as_vector is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why None instead of making it a bool?

This comment was marked as outdated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Why not just

def shape_ext(self, as_vector=True):
  shape_ext = (self.n, ) if as_vector else (self.n, self.m)

How is this off the topic?

Copy link
Member

@k-ye k-ye Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.m is ignored even if self.m != 1?

I don't see why this is a problem when as_vector is made a bool? For example, I can ask the question about what happens if as_vector is not None, but self.m != 1? According to the current logic, it also returns (self.n, ).

How about, when as_vector is passed in by users, then raise an exception if self.m != 1 return (self.n, ) only when as_vector and self.m == 1?

taichi/python/export_visual.cpp Outdated Show resolved Hide resolved
python/taichi/misc/gui.py Show resolved Hide resolved
python/taichi/misc/gui.py Outdated Show resolved Hide resolved
@archibate archibate self-assigned this Jun 4, 2020
@archibate
Copy link
Collaborator Author

Also, could you briefly explain what the optimization is in the PR description?

See [GUI], I decided to expose this message to end-users... we may consider seperate end-user message and dev-user message like:

[GUI] less copy overhead for simple shader kernels with high resolution [gui] use taichi kernels instead of numpy functions to reduce set_image overhead

@archibate archibate changed the title [GUI] less copy overhead for simple shader kernels with high resolution [GUI] Use Taichi kernels instead of NumPy operations to reduce GUI.set_image overhead Jun 4, 2020
Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Taichi kernels instead of NumPy operations to reduce GUI.set_image overhead

Sorry, this is still not clear to me. Wasn't the old code also using Taichi kernel to copy out the data, e.g. to_numpy()? Is it because vector_to_image is faster than matrix_to_ext_arr? Or did we avoid some unnecessary copy somewhere?

@@ -497,11 +497,17 @@ def assign_renamed(x, y):
from .meta import fill_matrix
fill_matrix(self, val)

def shape_ext(self, as_vector=None):
if as_vector is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. Why not just

def shape_ext(self, as_vector=True):
  shape_ext = (self.n, ) if as_vector else (self.n, self.m)

How is this off the topic?

@archibate
Copy link
Collaborator Author

archibate commented Jun 4, 2020

Use Taichi kernels instead of NumPy operations to reduce GUI.set_image overhead

Sorry, this is still not clear to me. Wasn't the old code also using Taichi kernel to copy out the data, e.g. to_numpy()? Is it because vector_to_image is faster than matrix_to_ext_arr? Or did we avoid some unnecessary copy somewhere?

It's a kernel fusion.

Before:
ti.meta.matrix_to_ext_arr
np.astype
np.__div__
np.concatenate
np.clip

After:
ti.meta.vector_to_image

If you don't understand, just remember the result of these "mystery" works is: 21 fps -> 50 fps.

@archibate
Copy link
Collaborator Author

I don't get it. Why not just
def shape_ext(self, as_vector=True):
shape_ext = (self.n, ) if as_vector else (self.n, self.m)
How is this off the topic?

So, if I call mat.shape_ext(), self.m is ignored even if self.m != 1?

@archibate archibate requested a review from k-ye June 5, 2020 04:59
Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a kernel fusion.

I see. That's kind of the things I was looking for. Can you put that in the PR description (it's empty) so that we know how this has been improved?

If you don't understand, just remember the result of these "mystery" works is: 21 fps -> 50 fps.

While I'm glad about the result, this is really not how code review works. If you list me as a reviewer, then I will need to care about how it works just as much as the outcome, so that I can give useful feedback.

@@ -497,11 +497,17 @@ def assign_renamed(x, y):
from .meta import fill_matrix
fill_matrix(self, val)

def shape_ext(self, as_vector=None):
if as_vector is None:
Copy link
Member

@k-ye k-ye Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.m is ignored even if self.m != 1?

I don't see why this is a problem when as_vector is made a bool? For example, I can ask the question about what happens if as_vector is not None, but self.m != 1? According to the current logic, it also returns (self.n, ).

How about, when as_vector is passed in by users, then raise an exception if self.m != 1 return (self.n, ) only when as_vector and self.m == 1?

Copy link
Member

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the as_vector is a tiny detail. TBH given the current usage pattern, maybe we should just completely remove it and always return (self.n, ) when self.m == 1? Otherwise LGTM, thx!

@archibate archibate requested a review from zhai-xiao June 5, 2020 14:31
Copy link
Contributor

@zhai-xiao zhai-xiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks!

python/taichi/misc/gui.py Outdated Show resolved Hide resolved
python/taichi/misc/image.py Outdated Show resolved Hide resolved
Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general! Please fix a few minor issues.

If you don't understand, just remember the result of these "mystery" works is: 21 fps -> 50 fps.

Although I really like the performance boost, this sentence sounds really aggressive and unfriendly. Please always be polite. As a core developer, it's important for you to lead by example.

python/taichi/lang/meta.py Outdated Show resolved Hide resolved
python/taichi/misc/image.py Outdated Show resolved Hide resolved
python/taichi/misc/image.py Outdated Show resolved Hide resolved
python/taichi/misc/gui.py Outdated Show resolved Hide resolved
python/taichi/lang/meta.py Outdated Show resolved Hide resolved
python/taichi/misc/gui.py Outdated Show resolved Hide resolved
python/taichi/misc/gui.py Outdated Show resolved Hide resolved
@archibate archibate requested a review from yuanming-hu June 6, 2020 04:49
python/taichi/misc/gui.py Outdated Show resolved Hide resolved
Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.

The fact that we don't have unit tests for these APIs may lead to bad code quality/stability control here. We should consider adding tests for these functionalities.

@archibate
Copy link
Collaborator Author

Yes, you're right, it was till now I realized that ti.imwrite have bug with RGBA images, maybe we should add test_gui, but it's hard to test, isn't it? e.g. How can we simulate key press in test? Worth consideration in a sep PR/issue. I'll merge this for now before v0.6.8.

@archibate archibate merged commit d7663b7 into taichi-dev:master Jun 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GAMES201 GAMES 201 students' wishlist
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[benchmark] A simple shader takes too long time in set_image & to_numpy
5 participants