Using `ti.loop_config(serialize=True)` before `ti.grouped()` leads to a wrong result #4807

YuCrazing · 2022-04-16T14:16:35Z

Describe the bug
When using ti.loop_config(serialize=True) before ti.grouped(), I will get a wrong execution result. If we do not support such usage, maybe we should point out this in the doc, or give an error message at the compilation stage.

To Reproduce

A prefix sum example is shown as below:

import taichi as ti
ti.init(arch=ti.cuda)

@ti.kernel
def scan_bad(arr: ti.template(), sum: ti.template()):
    sum[0] = arr[0]
    ti.loop_config(serialize=True)
    for I in ti.grouped(arr):
        if I.x > 0:
            sum[I] = arr[I] + sum[I-1]

@ti.kernel
def scan_good(arr: ti.template(), sum: ti.template()):
    sum[0] = arr[0]
    ti.loop_config(serialize=True)
    for i in range(3):
        if i > 0:
            sum[i] = arr[i] + sum[i-1]

arr = ti.field(ti.i32, 3)
sum = ti.field(ti.i32, 3)
arr[0] = 1
arr[1] = 1
arr[2] = 1

scan_bad(arr, sum)      # sum: [1 2 1]
# scan_good(arr, sum)   # sum: [1 2 3]
print(sum)

The text was updated successfully, but these errors were encountered:

turbo0628 · 2022-04-21T13:47:25Z

This is wired, the two versions have identical optimized IR.

kernel {
  $0 = offloaded range_for(0, 4) grid_dim=2176 block_dim=4
  body {
    <i32> $1 = const [0]
    <i32> $2 = const [3]
    <i32> $3 = loop $0 index 0
    <i32> $4 = cmp_lt $3 $2
    $5 : if $4 {
      <*gen> $6 = get root [S0root][root]
      <*gen> $7 = [S0root][root]::lookup($6, $1) activate = false
      <*gen> $8 = get child [S0root->S3dense] $7
      <i32> $9 = bit_and $3 $2
      <*gen> $10 = [S3dense][dense]::lookup($8, $9) activate = false
      <*i32> $11 = get child [S3dense->S4place<i32>] $10
      <i32> $12 = global load $11
      <*i32> $13 = arg[0]
      <*i32> $14 = external_ptr <$13>, [$3]
      $15 : global store [$14 <- $12]
    }
  }
}

But the final results are different. Is there anything missing in the IR printer?

FantasyVR · 2022-04-21T14:55:34Z

This seems to be the arch problem. If you set your arch as ti.cpu, the result is right. If the arch is ti.cuda or ti.metal, sequential is not guaranteed.

import taichi as ti
ti.init(arch=ti.metal)
n = 30
A = ti.field(ti.i32, n)

@ti.kernel
def init():
    sum = 0
    ti.loop_config(serialize=True)
    for I in ti.grouped(A):
        print(I)
init()

As @turbo0628 said, the IR on both arch are also the same:

kernel {
  $0 = offloaded range_for(0, 32) grid_dim=0 block_dim=32
  body {
    <i32> $1 = loop $0 index 0
    <i32> $2 = const [31]
    <i32> $3 = bit_and $1 $2
    <i32> $4 = const [30]
    <i32> $5 = cmp_lt $3 $4
    $6 : if $5 {
      print "[", $3, "]\n"
    }
  }
}

turbo0628 · 2022-04-22T01:49:19Z

The same backend should not generate different results given identical IRs. I think there's something missing in the IR printer, which is the key to cause the differences.

lin-hitonami · 2022-04-22T07:16:11Z

serialize=1 does not make a struct for loop run serially, and the struct for does not guarantee the executing order. So, the result may not be right.

lin-hitonami · 2022-04-22T07:40:05Z

It is already in the doc, maybe we can add a warning when serialize=1 is applied on fors other than range/ndrange fors.

YuCrazing added the potential bug Something that looks like a bug but not yet confirmed label Apr 16, 2022

FantasyVR added this to the Taichi v1.0.1 milestone Apr 21, 2022

FantasyVR added this to Taichi Lang Apr 21, 2022

FantasyVR moved this to Untriaged in Taichi Lang Apr 21, 2022

FantasyVR moved this from Untriaged to Todo in Taichi Lang Apr 21, 2022

ailzhang moved this from Todo to Untriaged in Taichi Lang Apr 22, 2022

FantasyVR removed this from the Taichi v1.0.1 milestone Apr 22, 2022

FantasyVR moved this from Untriaged to Todo in Taichi Lang Apr 22, 2022

lin-hitonami mentioned this issue Apr 22, 2022

[Error] Show warning when serialize=True is set on a struct for #4844

Merged

k-ye closed this as completed in #4844 Apr 22, 2022

Repository owner moved this from Todo to Done in Taichi Lang Apr 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using `ti.loop_config(serialize=True)` before `ti.grouped()` leads to a wrong result #4807

Using `ti.loop_config(serialize=True)` before `ti.grouped()` leads to a wrong result #4807

YuCrazing commented Apr 16, 2022

turbo0628 commented Apr 21, 2022 •

edited

Loading

FantasyVR commented Apr 21, 2022 •

edited

Loading

turbo0628 commented Apr 22, 2022

lin-hitonami commented Apr 22, 2022 •

edited

Loading

lin-hitonami commented Apr 22, 2022 •

edited

Loading

Using ti.loop_config(serialize=True) before ti.grouped() leads to a wrong result #4807

Using ti.loop_config(serialize=True) before ti.grouped() leads to a wrong result #4807

Comments

YuCrazing commented Apr 16, 2022

turbo0628 commented Apr 21, 2022 • edited Loading

FantasyVR commented Apr 21, 2022 • edited Loading

turbo0628 commented Apr 22, 2022

lin-hitonami commented Apr 22, 2022 • edited Loading

lin-hitonami commented Apr 22, 2022 • edited Loading

Using `ti.loop_config(serialize=True)` before `ti.grouped()` leads to a wrong result #4807

Using `ti.loop_config(serialize=True)` before `ti.grouped()` leads to a wrong result #4807

turbo0628 commented Apr 21, 2022 •

edited

Loading

FantasyVR commented Apr 21, 2022 •

edited

Loading

lin-hitonami commented Apr 22, 2022 •

edited

Loading

lin-hitonami commented Apr 22, 2022 •

edited

Loading