Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ti.loop_config(serialize=True) before ti.grouped() leads to a wrong result #4807

Closed
YuCrazing opened this issue Apr 16, 2022 · 5 comments · Fixed by #4844
Closed

Using ti.loop_config(serialize=True) before ti.grouped() leads to a wrong result #4807

YuCrazing opened this issue Apr 16, 2022 · 5 comments · Fixed by #4844
Labels
potential bug Something that looks like a bug but not yet confirmed

Comments

@YuCrazing
Copy link
Contributor

Describe the bug
When using ti.loop_config(serialize=True) before ti.grouped(), I will get a wrong execution result. If we do not support such usage, maybe we should point out this in the doc, or give an error message at the compilation stage.

To Reproduce

A prefix sum example is shown as below:

import taichi as ti
ti.init(arch=ti.cuda)

@ti.kernel
def scan_bad(arr: ti.template(), sum: ti.template()):
    sum[0] = arr[0]
    ti.loop_config(serialize=True)
    for I in ti.grouped(arr):
        if I.x > 0:
            sum[I] = arr[I] + sum[I-1]

@ti.kernel
def scan_good(arr: ti.template(), sum: ti.template()):
    sum[0] = arr[0]
    ti.loop_config(serialize=True)
    for i in range(3):
        if i > 0:
            sum[i] = arr[i] + sum[i-1]

arr = ti.field(ti.i32, 3)
sum = ti.field(ti.i32, 3)
arr[0] = 1
arr[1] = 1
arr[2] = 1

scan_bad(arr, sum)      # sum: [1 2 1]
# scan_good(arr, sum)   # sum: [1 2 3]
print(sum)
@YuCrazing YuCrazing added the potential bug Something that looks like a bug but not yet confirmed label Apr 16, 2022
@turbo0628
Copy link
Member

turbo0628 commented Apr 21, 2022

This is wired, the two versions have identical optimized IR.

kernel {
  $0 = offloaded range_for(0, 4) grid_dim=2176 block_dim=4
  body {
    <i32> $1 = const [0]
    <i32> $2 = const [3]
    <i32> $3 = loop $0 index 0
    <i32> $4 = cmp_lt $3 $2
    $5 : if $4 {
      <*gen> $6 = get root [S0root][root]
      <*gen> $7 = [S0root][root]::lookup($6, $1) activate = false
      <*gen> $8 = get child [S0root->S3dense] $7
      <i32> $9 = bit_and $3 $2
      <*gen> $10 = [S3dense][dense]::lookup($8, $9) activate = false
      <*i32> $11 = get child [S3dense->S4place<i32>] $10
      <i32> $12 = global load $11
      <*i32> $13 = arg[0]
      <*i32> $14 = external_ptr <$13>, [$3]
      $15 : global store [$14 <- $12]
    }
  }
}

But the final results are different. Is there anything missing in the IR printer?

@FantasyVR
Copy link
Collaborator

FantasyVR commented Apr 21, 2022

This seems to be the arch problem. If you set your arch as ti.cpu, the result is right. If the arch is ti.cuda or ti.metal, sequential is not guaranteed.

import taichi as ti
ti.init(arch=ti.metal)
n = 30
A = ti.field(ti.i32, n)

@ti.kernel
def init():
    sum = 0
    ti.loop_config(serialize=True)
    for I in ti.grouped(A):
        print(I)
init()

As @turbo0628 said, the IR on both arch are also the same:

kernel {
  $0 = offloaded range_for(0, 32) grid_dim=0 block_dim=32
  body {
    <i32> $1 = loop $0 index 0
    <i32> $2 = const [31]
    <i32> $3 = bit_and $1 $2
    <i32> $4 = const [30]
    <i32> $5 = cmp_lt $3 $4
    $6 : if $5 {
      print "[", $3, "]\n"
    }
  }
}

@FantasyVR FantasyVR added this to the Taichi v1.0.1 milestone Apr 21, 2022
@FantasyVR FantasyVR moved this to Untriaged in Taichi Lang Apr 21, 2022
@FantasyVR FantasyVR moved this from Untriaged to Todo in Taichi Lang Apr 21, 2022
@turbo0628
Copy link
Member

The same backend should not generate different results given identical IRs. I think there's something missing in the IR printer, which is the key to cause the differences.

@ailzhang ailzhang moved this from Todo to Untriaged in Taichi Lang Apr 22, 2022
@FantasyVR FantasyVR removed this from the Taichi v1.0.1 milestone Apr 22, 2022
@FantasyVR FantasyVR moved this from Untriaged to Todo in Taichi Lang Apr 22, 2022
@lin-hitonami
Copy link
Contributor

lin-hitonami commented Apr 22, 2022

serialize=1 does not make a struct for loop run serially, and the struct for does not guarantee the executing order. So, the result may not be right.

@lin-hitonami
Copy link
Contributor

lin-hitonami commented Apr 22, 2022

It is already in the doc, maybe we can add a warning when serialize=1 is applied on fors other than range/ndrange fors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug Something that looks like a bug but not yet confirmed
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants