-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Lang] [IR] Kernel scalar return
support (ArgStoreStmt
-> KernelReturnStmt
)
#917
Conversation
@yuanming-hu Things to consider:
|
Lines 15 to 22 in 7028981
What if sizeof(G) < sizeof(T) ? Will the higher part not zero-initialized?What if it's signed type? Will it's sign bit extended correctly? (e.g. 1011 -> 11111011) |
How do I transfer |
Good questions.
We should type-hint
Let's simply disallow diffTaichi kernels to have return types.
Not sure if I clearly understand your question but I believe you can figure out a solution on your own. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEXT: separate ret and arg.
taichi/program/kernel.cpp
Outdated
@@ -145,10 +169,42 @@ void Kernel::set_arg_int(int i, int64 d) { | |||
} | |||
} | |||
|
|||
int64 Kernel::get_arg_int(int i) { // TODO: will consider uint? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does unsigned/signed extension matter for get? I just copied your set_arg_int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be fine.
@@ -1646,6 +1646,22 @@ class FuncCallStmt : public Stmt { | |||
DEFINE_ACCEPT | |||
}; | |||
|
|||
class KernelReturnStmt : public Stmt { | |||
public: | |||
Stmt *value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put multi-return in another PR.
emit("_args_{}_[0] = {};", // TD: correct idx, another buf | ||
"i64",//data_type_short_name(stmt->element_type()), | ||
stmt->value->short_name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too bad, still using arg[0]
for return. We want ret[0]
instead.
Also pls update context
. get_ret_int
.
python/taichi/lang/kernel.py
Outdated
@@ -179,6 +179,9 @@ def reset(self): | |||
self.compiled_functions = self.runtime.compiled_grad_functions | |||
|
|||
def extract_arguments(self): | |||
self.arguments.append(i64) # TODO: rettype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest: self.arguments
-> self.argument_types
(broken window).
Dirty hack into argret, now, to test: import taichi as ti
ti.init(arch=ti.opengl, print_preprocessed=True)
@ti.kernel
def func(a: ti.i64) -> ti.i64:
return a * 2
res = func(233)
print(res) # 466 |
How to inspect function return annotation by |
NEXT: figure out how to define |
Done with LLVM here, now check out: import taichi as ti
ti.init(arch=ti.x64, print_preprocessed=True, print_ir=True)
@ti.kernel
def func(a: ti.i64, b: ti.f64) -> ti.f64:
return a * b
res = func(100, 2.333)
print(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to add Metal support for this once its' in?
BTW, am I correct in saying that this PR is using args[0]
as the return value? Could this be problematic if the kernel takes in argument?
Yes, would you like to do it in this PR or not?
Yes, it's args[0] for metal and opengl, and result_buffer[0] for llvm.
Not a problem since KernelReturnStmt will always be the last statement. |
I can take care of that in another PR, but thanks!
OK. I see that you have a
WDYT? (I'm not suggesting to do this here, just a goal in the future) |
KernelReturnStmt
KernelReturnStmt
to support return in kernel
KernelReturnStmt
to support return in kernelreturn
support: remove ArgStoreStmt
, add KernelReturnStmt
I prefer 2 although LLVM is already on 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finalized my pass. Looks great in general except for a few nits...
taichi/program/kernel.cpp
Outdated
} else if (dt == DataType::i64) { | ||
return (float64)fetch_result<int64>(i); | ||
} else if (dt == DataType::i8) { | ||
return (float64)fetch_result<int8>(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I trust your decision.
taichi/program/kernel.cpp
Outdated
} | ||
|
||
template <typename T> | ||
static T fetch_result(int i) // TODO: move to Program::fetch_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to move fetch_result_uint64
and this to Program
in this PR. The modification would not be too big I guess, since you already need a few get_current_program
calls in fetch_result_uint64
.
taichi/ir/snode.cpp
Outdated
uint64 SNode::fetch_reader_result() { | ||
uint64 ret; | ||
auto arch = get_current_program().config.arch; | ||
if (arch == Arch::cuda) { | ||
// TODO: refactor | ||
// XXX: what about unified memory? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// XXX: what about unified memory? | |
// We use a `memcpy_device_to_host` call here even if we have unified memory. This simplifies code. Also note that a unified memory (4KB) page fault is rather expensive for reading 4-8 bytes. |
taichi/transforms/type_check.cpp
Outdated
} | ||
auto &rets = current_kernel->rets; | ||
TI_ASSERT(rets.size() >= 1); | ||
auto ret = rets[0]; // TODO: stmt->ret_id? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto ret = rets[0]; // TODO: stmt->ret_id? | |
// TODO: Support cases when stmt->ret_id other than 0 | |
TI_ASSERT(stmt->ret_id == 0); | |
auto ret = rets[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to let unsupported cases fail loudly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but we don't have ret_id
yet. Will add in another PR.
I think either is fine and for each backend, it's better to choose a design that is most suitable. I understand that on OpenGL the number of buffers is limited, so it makes sense to go 2. For LLVM, going 1 will make things easier. |
Yeah, I think this is a per backend design choice. Metal is already mixing args and return values in one buffer, so I think its easier to do 2. Thanks for the confirmation! |
Thank for your valuable discussions! Nice to know we could have flexibility for each backend. |
return
support: remove ArgStoreStmt
, add KernelReturnStmt
return
support (ArgStoreStmt
-> KernelReturnStmt
)
|
Having hard writing test: import taichi as ti
from taichi import approx
def _test_binary_func_ret(dt1, dt2, dt3):
ti.init(print_preprocessed=True)
x = ti.var(ti.i32, ())
@ti.kernel
def func(a: dt1, b: dt2) -> dt3:
# dummy = dt3 # uncomment to pass
return a * b
if ti.core.is_integral(dt1):
xs = list(range(4))
else:
xs = [0.2, 0.4, 0.8, 1.0]
if ti.core.is_integral(dt2):
ys = list(range(4))
else:
ys = [0.2, 0.4, 0.8, 1.0]
for x, y in zip(xs, ys):
assert func(x, y) == approx(x * y)
_test_binary_func_ret(ti.i32, ti.f32, ti.f32)
I checked |
llvm ok, ogl many other fail, fix tmr. |
Maybe try inserting the return type into taichi/python/taichi/lang/kernel.py Lines 247 to 250 in 66c0b43
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you! Feel free to merge after CI passes! (pls ignore my previous message)
@k-ye I'm merging this now, feel free to add another PR for metal! |
…839) * [IR] upgrade constant fold pass, now supports more op types * fix test * add divs * [skip ci] tmp work save * [skip ci] ker func nothing [skip ci] why recursive [skip ci] fantl why recursive! [skip ci] how to insert stmt? [skip ci] how * [skip ci] enforce code format * IT WORKSga! * [skip ci] tmp save * FIX BUG * revert * just before cache * C A C H E ! fix cache really did fix fix fix * revert it! really no no_cp2o * [skip ci] tmp save * try mutex * fix test by move to program * more types * [skip ci] nit clean * [skip ci] assert prog != null * [skip ci] apply to all type * [skip ci] also unary ops * add unique Program for test_cpp * [skip ci] add unary cast support & clean nit (still module == null) * give up LLVM, go for oPENgl * fix typo * fix by materialize_layout * back to cpu * [skip ci] clean * [skip ci] tmp save * [skip ci] fix by opengl * [skip ci] why 233 * [skip ci] 0 save * [skip ci] fake die loop * [skip ci] add todo * use KernelReturnStmt from #917 fix * [skip ci] try CurrentKernelGuard * [skip ci] fix types * [skip ci] add type to lang_util * [skip ci] revert 5c1b88 (fix types) * refuse jit-cfd for cast * fix test_types * disable cfd when debug=True * [skip ci] nit * fix ctx override * [skip ci] clean * [skip ci] nit app * [skip ci] app * using JITEvaluatorIdType = int * re-add !!is_binary * [skip ci] fmt * [skip ci] upgrade JITEValuatorIdType * fix fix comment * [skip ci] move to advanced_optimization Co-authored-by: Taichi Gardener <taichigardener@gmail.com> Co-authored-by: xumingkuan <xumingkuan0721@126.com>
Related issue = #909
Currently I'm still using
arg[0]
to store return value, clearly you want an extra buffer incontext
to store return value (helps multi-return).That's good for other backends but I refuse this for OpenGL, reason: GL limits buffers number to 10 , however we already have up to 13~17 if all buffers are turned on.
So if we have external array, arguments, global temp, 64-bit integer used in a single kernel, GL fails. We depends on kernel to be simple and fortunately no test have covered that complicated combination yet..
I will try to combine them together so arg buffer is still r/w - first part write-only (ret) and second part read-only (arg).
To test:
[Click here for the format server]