[Lang] [IR] Kernel scalar `return` support (`ArgStoreStmt` -> `KernelReturnStmt`) #917

archibate · 2020-05-03T06:42:56Z

Related issue = #909

Currently I'm still using arg[0] to store return value, clearly you want an extra buffer in context to store return value (helps multi-return).
That's good for other backends but I refuse this for OpenGL, reason: GL limits buffers number to 10 , however we already have up to 13~17 if all buffers are turned on.
So if we have external array, arguments, global temp, 64-bit integer used in a single kernel, GL fails. We depends on kernel to be simple and fortunately no test have covered that complicated combination yet..
I will try to combine them together so arg buffer is still r/w - first part write-only (ret) and second part read-only (arg).

To test:

import taichi as ti

ti.init(arch=ti.opengl, print_preprocessed=True)

@ti.kernel
def func(t: ti.i32):
    return 233


res = func(666)
print(res)

[Click here for the format server]

archibate · 2020-05-03T17:14:06Z

@yuanming-hu Things to consider:

do we type-hint return type, or determined by the last return rhs?
How about diff kernels? I'm not very knowledged about difftaichi... sorry.

archibate · 2020-05-04T04:26:19Z

taichi/taichi/inc/constants.h

Lines 15 to 22 in 7028981

    
           T taichi_union_cast_with_different_sizes(G g) { 
        
             union { 
        
               T t; 
        
               G g; 
        
             } u; 
        
             u.g = g; 
        
             return u.t; 
        
           }

What if sizeof(G) < sizeof(T)? Will the higher part not zero-initialized?
What if it's signed type? Will it's sign bit extended correctly? (e.g. 1011 -> 11111011)

archibate · 2020-05-04T05:08:22Z

How do I transfer ret_type from transformer.py to kernel.py?

yuanming-hu · 2020-05-04T05:23:11Z

Good questions.

do we type-hint return type, or determined by the last return rhs?

We should type-hint @ti.kernels.

2. How about diff kernels? I'm not very knowledged about difftaichi... sorry.

Let's simply disallow diffTaichi kernels to have return types.

How do I transfer ret_type from transformer.py to kernel.py?

Not sure if I clearly understand your question but I believe you can figure out a solution on your own.

archibate

NEXT: separate ret and arg.

archibate · 2020-05-04T16:34:36Z

taichi/program/kernel.cpp

@@ -145,10 +169,42 @@ void Kernel::set_arg_int(int i, int64 d) {
  }
 }

+int64 Kernel::get_arg_int(int i) { // TODO: will consider uint?


Does unsigned/signed extension matter for get? I just copied your set_arg_int.

That should be fine.

archibate · 2020-05-04T16:35:49Z

taichi/ir/ir.h

@@ -1646,6 +1646,22 @@ class FuncCallStmt : public Stmt {
  DEFINE_ACCEPT
 };

+class KernelReturnStmt : public Stmt {
+ public:
+  Stmt *value;


Let's put multi-return in another PR.

archibate · 2020-05-04T16:36:39Z

taichi/backends/opengl/codegen_opengl.cpp

+    emit("_args_{}_[0] = {};", // TD: correct idx, another buf
+         "i64",//data_type_short_name(stmt->element_type()),
+         stmt->value->short_name());


Too bad, still using arg[0] for return. We want ret[0] instead.
Also pls update context. get_ret_int.

archibate · 2020-05-04T16:38:37Z

python/taichi/lang/kernel.py

@@ -179,6 +179,9 @@ def reset(self):
            self.compiled_functions = self.runtime.compiled_grad_functions

    def extract_arguments(self):
+        self.arguments.append(i64) # TODO: rettype


Suggest: self.arguments -> self.argument_types (broken window).

archibate · 2020-05-04T17:23:07Z

Dirty hack into argret, now, to test:

import taichi as ti

ti.init(arch=ti.opengl, print_preprocessed=True)

@ti.kernel
def func(a: ti.i64) -> ti.i64:
    return a * 2


res = func(233)
print(res) # 466

archibate · 2020-05-05T15:09:53Z

How to inspect function return annotation by import inspect?

archibate · 2020-05-05T16:52:49Z

NEXT: figure out how to define element_type() for KernelReturnStmt. Sleep now to prevent a class drop, cutmr.

archibate · 2020-05-07T03:26:03Z

Done with LLVM here, now check out:

import taichi as ti

ti.init(arch=ti.x64, print_preprocessed=True, print_ir=True)

@ti.kernel
def func(a: ti.i64, b: ti.f64) -> ti.f64:
    return a * b

res = func(100, 2.333)
print(res)

k-ye

I think I need to add Metal support for this once its' in?

BTW, am I correct in saying that this PR is using args[0] as the return value? Could this be problematic if the kernel takes in argument?

archibate · 2020-05-07T12:05:56Z

I think I need to add Metal support for this once its' in?

Yes, would you like to do it in this PR or not?

BTW, am I correct in saying that this PR is using args[0] as the return value?

Yes, it's args[0] for metal and opengl, and result_buffer[0] for llvm.

problematic if the kernel takes in argument?

Not a problem since KernelReturnStmt will always be the last statement.

k-ye · 2020-05-07T12:15:20Z

Yes, would you like to do it in this PR or not?

I can take care of that in another PR, but thanks!

it's args[0] for metal and opengl, and result_buffer[0] for llvm. Not a problem since KernelReturnStmt will always be the last statement.

OK. I see that you have a TODO to consider using result buffer for OpenGL as well. I guess it's probably cleaner if we:

Follow the TODO and separate return args from input args; or
Similar to the existing design, where args and return args are in the same buffer, but they have different indices.

WDYT? (I'm not suggesting to do this here, just a goal in the future)

archibate · 2020-05-07T14:31:34Z

I prefer 2 although LLVM is already on 1.
First, implementing this can be hard thanks to OpenGL API limitations as mentioned in PR description, no matter our philosophy.
Second, having one buffer RO and another one WO sounds like a waste. Also they won't influence each other as demostrated. Why not just combine them together? Having different indices/typings is already good enough. So I suggest LLVM&Metal use 2 too.

yuanming-hu

Finalized my pass. Looks great in general except for a few nits...

yuanming-hu · 2020-05-07T14:23:07Z

taichi/program/kernel.cpp

+  } else if (dt == DataType::i64) {
+    return (float64)fetch_result<int64>(i);
+  } else if (dt == DataType::i8) {
+    return (float64)fetch_result<int8>(i);


Sounds good. I trust your decision.

yuanming-hu · 2020-05-07T14:29:27Z

taichi/program/kernel.cpp

+}
+
+template <typename T>
+static T fetch_result(int i) // TODO: move to Program::fetch_result


It would be great to move fetch_result_uint64 and this to Program in this PR. The modification would not be too big I guess, since you already need a few get_current_program calls in fetch_result_uint64.

taichi/ir/snode.cpp

yuanming-hu · 2020-05-07T14:33:21Z

taichi/ir/snode.cpp

 uint64 SNode::fetch_reader_result() {
  uint64 ret;
  auto arch = get_current_program().config.arch;
  if (arch == Arch::cuda) {
    // TODO: refactor
+    // XXX: what about unified memory?


Suggested change

// XXX: what about unified memory?

// We use a `memcpy_device_to_host` call here even if we have unified memory. This simplifies code. Also note that a unified memory (4KB) page fault is rather expensive for reading 4-8 bytes.

yuanming-hu · 2020-05-07T14:38:41Z

taichi/transforms/type_check.cpp

+    }
+    auto &rets = current_kernel->rets;
+    TI_ASSERT(rets.size() >= 1);
+    auto ret = rets[0]; // TODO: stmt->ret_id?


Suggested change

auto ret = rets[0]; // TODO: stmt->ret_id?

// TODO: Support cases when stmt->ret_id other than 0

TI_ASSERT(stmt->ret_id == 0);

auto ret = rets[0];

It's better to let unsupported cases fail loudly.

Sorry but we don't have ret_id yet. Will add in another PR.

yuanming-hu · 2020-05-07T14:43:13Z

I prefer 2 although LLVM is already on 1.
First, implementing this can be hard thanks to OpenGL API limitations as mentioned in PR description, no matter our philosophy.
Second, having one buffer RO and another one WO sounds like a waste. Also they won't influence each other as demostrated. Why not just combine them together? Having different indices/typings is already good enough. So I suggest LLVM&Metal use 2 too.

I think either is fine and for each backend, it's better to choose a design that is most suitable. I understand that on OpenGL the number of buffers is limited, so it makes sense to go 2. For LLVM, going 1 will make things easier.

k-ye · 2020-05-07T14:57:20Z

Yeah, I think this is a per backend design choice. Metal is already mixing args and return values in one buffer, so I think its easier to do 2. Thanks for the confirmation!

archibate · 2020-05-07T15:15:26Z

Thank for your valuable discussions! Nice to know we could have flexibility for each backend.
Also I add three further consideration after this PR merged: ti.ext_arr()? ti.template()? ti.Matrix?

yuanming-hu · 2020-05-07T15:29:45Z

Also I add three further consideration after this PR merged: ti.ext_arr()? ti.template()? ti.Matrix?

ti.ext_arr() currently abuses the argument buffer to store the pointer and array sizes. This would need a refactoring in the future for more clarity. This is not urgent though.
ti.template() is in charge of template instantiation and is treated fully in the frontend. I wouldn't worry too much about this.
ti.Matrix clearly needs more considerations. I'm not sure how the return statement should be designed for this struct.

archibate · 2020-05-07T15:59:56Z

Having hard writing test:

import taichi as ti
from taichi import approx

def _test_binary_func_ret(dt1, dt2, dt3):
    ti.init(print_preprocessed=True)

    x = ti.var(ti.i32, ())

    @ti.kernel
    def func(a: dt1, b: dt2) -> dt3:
        # dummy = dt3 # uncomment to pass
        return a * b

    if ti.core.is_integral(dt1):
        xs = list(range(4))
    else:
        xs = [0.2, 0.4, 0.8, 1.0]

    if ti.core.is_integral(dt2):
        ys = list(range(4))
    else:
        ys = [0.2, 0.4, 0.8, 1.0]

    for x, y in zip(xs, ys):
        assert func(x, y) == approx(x * y)


_test_binary_func_ret(ti.i32, ti.f32, ti.f32)

[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-qcgkx5tj
[Taichi] sandbox prepared
[Taichi] <dev mode>, supported archs: [cpu, cuda, opengl], commit 2762aeef, python 3.8.2
Before preprocessing:
@ti.kernel
def func(a: dt1, b: dt2) ->dt3:
    return a * b

After preprocessing:
def func():
  ti.decl_scalar_ret(dt3)
  a = ti.decl_scalar_arg(dt1)
  b = ti.decl_scalar_arg(dt2)
  ti.core.create_kernel_return(ti.cast(ti.Expr(a * b), dt3).ptr)

Traceback (most recent call last):
  File "tst.my.py", line 27, in <module>
    _test_binary_func_ret(ti.i32, ti.f32, ti.f32)
  File "tst.my.py", line 24, in _test_binary_func_ret
    assert func(x, y) == approx(x * y)
  File "/home/bate/Develop/taichi/python/taichi/lang/kernel.py", line 497, in wrapped
    return primal(*args, **kwargs)
  File "/home/bate/Develop/taichi/python/taichi/lang/kernel.py", line 427, in __call__
    self.materialize(key=key, args=args, arg_features=arg_features)
  File "/home/bate/Develop/taichi/python/taichi/lang/kernel.py", line 307, in materialize
    taichi_kernel = taichi_kernel.define(taichi_ast_generator)
  File "/home/bate/Develop/taichi/python/taichi/lang/kernel.py", line 304, in taichi_ast_generator
    compiled()
  File "tst.my.py", line 18, in func
    if ti.core.is_integral(dt2):
NameError: name 'dt3' is not defined

I checked func.__globals__, why dt1 and dt2 works and dt3 don't?

archibate · 2020-05-07T17:03:33Z

llvm ok, ogl many other fail, fix tmr.

yuanming-hu · 2020-05-07T21:41:18Z

I checked func.__globals__, why dt1 and dt2 works and dt3 don't?

Maybe try inserting the return type into globals? We did that for the argument types:

taichi/python/taichi/lang/kernel.py

Lines 247 to 250 in 66c0b43

    
           for i, arg in enumerate(func_body.args.args): 
        
               anno = arg.annotation 
        
               if isinstance(anno, ast.Name): 
        
                   global_vars[anno.id] = self.arguments[i]

yuanming-hu

Looks great! Thank you! Feel free to merge after CI passes! (pls ignore my previous message)

archibate · 2020-05-08T05:52:00Z

@k-ye I'm merging this now, feel free to add another PR for metal!

fix

…839) * [IR] upgrade constant fold pass, now supports more op types * fix test * add divs * [skip ci] tmp work save * [skip ci] ker func nothing [skip ci] why recursive [skip ci] fantl why recursive! [skip ci] how to insert stmt? [skip ci] how * [skip ci] enforce code format * IT WORKSga! * [skip ci] tmp save * FIX BUG * revert * just before cache * C A C H E ! fix cache really did fix fix fix * revert it! really no no_cp2o * [skip ci] tmp save * try mutex * fix test by move to program * more types * [skip ci] nit clean * [skip ci] assert prog != null * [skip ci] apply to all type * [skip ci] also unary ops * add unique Program for test_cpp * [skip ci] add unary cast support & clean nit (still module == null) * give up LLVM, go for oPENgl * fix typo * fix by materialize_layout * back to cpu * [skip ci] clean * [skip ci] tmp save * [skip ci] fix by opengl * [skip ci] why 233 * [skip ci] 0 save * [skip ci] fake die loop * [skip ci] add todo * use KernelReturnStmt from #917 fix * [skip ci] try CurrentKernelGuard * [skip ci] fix types * [skip ci] add type to lang_util * [skip ci] revert 5c1b88 (fix types) * refuse jit-cfd for cast * fix test_types * disable cfd when debug=True * [skip ci] nit * fix ctx override * [skip ci] clean * [skip ci] nit app * [skip ci] app * using JITEvaluatorIdType = int * re-add !!is_binary * [skip ci] fmt * [skip ci] upgrade JITEValuatorIdType * fix fix comment * [skip ci] move to advanced_optimization Co-authored-by: Taichi Gardener <taichigardener@gmail.com> Co-authored-by: xumingkuan <xumingkuan0721@126.com>

[skip ci] glsave

afc111b

archibate force-pushed the ret branch from 944ed0e to afc111b Compare May 3, 2020 16:13

[skip ci] python side return in kernel

7028981

archibate force-pushed the ret branch from 17e020e to 7028981 Compare May 3, 2020 17:20

archibate marked this pull request as ready for review May 4, 2020 04:33

[skip ci] can you see this?

00ca569

archibate force-pushed the ret branch from e55d1a5 to 00ca569 Compare May 4, 2020 16:15

[skip ci] merge master

b275a0b

archibate mentioned this pull request May 4, 2020

[IR] Use JIT compilation/evaluation for systematic constant folding #839

Merged

archibate commented May 4, 2020

View reviewed changes

[skip ci] auto insert arg for ret

8e75e66

[skip ci] sep1

3344f6a

archibate added 3 commits May 5, 2020 23:39

[skip ci] merge master

619f778

[skip ci] det rettype

5d11c0d

[skip ci] bet ret det

69fb032

[skip ci] fin s1

b6f2373

archibate requested a review from yuanming-hu May 6, 2020 07:59

archibate added 3 commits May 6, 2020 16:10

[skip ci] care noret

f665d23

[skip ci] can you see me? how to make llvm bit cast correct?

9e2d15a

[skip ci] fix

d2021e8

[skip ci] also gl

4ba3fc2

yuanming-hu requested a review from xumingkuan May 7, 2020 03:38

k-ye reviewed May 7, 2020

View reviewed changes

archibate changed the title ~~[IR] [Refactor] Add KernelReturnStmt~~ [Lang] [IR] [Refactor] Add KernelReturnStmt to support return in kernel May 7, 2020

archibate changed the title ~~[Lang] [IR] [Refactor] Add KernelReturnStmt to support return in kernel~~ [Lang] [IR] Kernel return support: remove ArgStoreStmt, add KernelReturnStmt May 7, 2020

yuanming-hu requested changes May 7, 2020

View reviewed changes

[skip ci] clean

a6f20aa

archibate changed the title ~~[Lang] [IR] Kernel return support: remove ArgStoreStmt, add KernelReturnStmt~~ [Lang] [IR] Kernel scalar return support (ArgStoreStmt -> KernelReturnStmt) May 7, 2020

archibate and others added 4 commits May 8, 2020 00:07

[skip ci] rid ret after trans

8187da8

[skip ci] enforce code format

4c21997

add test

f4bf30a

Merge branch 'ret' of github.com:archibate/taichi into ret

e24f34a

archibate added 2 commits May 8, 2020 11:45

[skip ci] fix ret anno not defined

badbe2b

apply

14de2ef

archibate requested a review from yuanming-hu May 8, 2020 03:52

yuanming-hu approved these changes May 8, 2020

View reviewed changes

[skip ci] fix gl failure

e53d1dc

archibate merged commit bbddaa1 into taichi-dev:master May 8, 2020

archibate added a commit to archibate/taichi that referenced this pull request May 8, 2020

use KernelReturnStmt from taichi-dev#917

c453f2f

archibate added a commit to archibate/taichi that referenced this pull request May 8, 2020

use KernelReturnStmt from taichi-dev#917

31a9251

fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Lang] [IR] Kernel scalar `return` support (`ArgStoreStmt` -> `KernelReturnStmt`) #917

[Lang] [IR] Kernel scalar `return` support (`ArgStoreStmt` -> `KernelReturnStmt`) #917

archibate commented May 3, 2020 •

edited

Loading

archibate commented May 3, 2020 •

edited

Loading

archibate commented May 4, 2020 •

edited

Loading

archibate commented May 4, 2020

yuanming-hu commented May 4, 2020

archibate left a comment

archibate May 4, 2020

yuanming-hu May 4, 2020

archibate May 4, 2020

archibate May 4, 2020

archibate May 4, 2020

archibate commented May 4, 2020

archibate commented May 5, 2020

archibate commented May 5, 2020

archibate commented May 7, 2020

k-ye left a comment

archibate commented May 7, 2020

k-ye commented May 7, 2020 •

edited

Loading

archibate commented May 7, 2020

yuanming-hu left a comment

yuanming-hu May 7, 2020

yuanming-hu May 7, 2020

yuanming-hu May 7, 2020

yuanming-hu May 7, 2020

yuanming-hu May 7, 2020

archibate May 8, 2020

yuanming-hu commented May 7, 2020

k-ye commented May 7, 2020

archibate commented May 7, 2020

yuanming-hu commented May 7, 2020

archibate commented May 7, 2020 •

edited

Loading

archibate commented May 7, 2020

yuanming-hu commented May 7, 2020

yuanming-hu left a comment •

edited

Loading

archibate commented May 8, 2020

	// XXX: what about unified memory?
	// We use a `memcpy_device_to_host` call here even if we have unified memory. This simplifies code. Also note that a unified memory (4KB) page fault is rather expensive for reading 4-8 bytes.

[Lang] [IR] Kernel scalar return support (ArgStoreStmt -> KernelReturnStmt) #917

[Lang] [IR] Kernel scalar return support (ArgStoreStmt -> KernelReturnStmt) #917

Conversation

archibate commented May 3, 2020 • edited Loading

archibate commented May 3, 2020 • edited Loading

archibate commented May 4, 2020 • edited Loading

archibate commented May 4, 2020

yuanming-hu commented May 4, 2020

archibate left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

archibate commented May 4, 2020

archibate commented May 5, 2020

archibate commented May 5, 2020

archibate commented May 7, 2020

k-ye left a comment

Choose a reason for hiding this comment

archibate commented May 7, 2020

k-ye commented May 7, 2020 • edited Loading

archibate commented May 7, 2020

yuanming-hu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuanming-hu commented May 7, 2020

k-ye commented May 7, 2020

archibate commented May 7, 2020

yuanming-hu commented May 7, 2020

archibate commented May 7, 2020 • edited Loading

archibate commented May 7, 2020

yuanming-hu commented May 7, 2020

yuanming-hu left a comment • edited Loading

Choose a reason for hiding this comment

archibate commented May 8, 2020

[Lang] [IR] Kernel scalar `return` support (`ArgStoreStmt` -> `KernelReturnStmt`) #917

[Lang] [IR] Kernel scalar `return` support (`ArgStoreStmt` -> `KernelReturnStmt`) #917

archibate commented May 3, 2020 •

edited

Loading

archibate commented May 3, 2020 •

edited

Loading

archibate commented May 4, 2020 •

edited

Loading

k-ye commented May 7, 2020 •

edited

Loading

archibate commented May 7, 2020 •

edited

Loading

yuanming-hu left a comment •

edited

Loading