Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure: pipeline_state_ != nullptr on metal #6221

Open
PGZXB opened this issue Oct 2, 2022 · 10 comments
Open

Assertion failure: pipeline_state_ != nullptr on metal #6221

PGZXB opened this issue Oct 2, 2022 · 10 comments
Assignees
Labels
bug We've confirmed that this is an BUG metal Metal backend

Comments

@PGZXB
Copy link
Contributor

PGZXB commented Oct 2, 2022

Describe the bug
Run the following code failed:

import taichi as ti

ti.init(arch=ti.metal)

@ti.kernel
def kernel4(n: ti.i32, m: ti.i32) -> ti.i32:
    res = 0
    for i in range(n):
        for j in range(m):
            res += j
    return res

kernel4(2, 2)

Output:

[Taichi] version 1.1.4, llvm 10.0.0, commit 9d65cbe7, osx, python 3.10.6
[I 10/03/22 09:41:07.419 328128] [misc.py:init@439] Following TI_ARCH setting up for arch=metal
[Taichi] Starting on arch=metal
2022-10-03 09:41:07.579 Python[27767:328128] Error Domain=AGXMetal13_3 Code=3 "Compiler encountered an internal error" UserInfo={NSLocalizedDescription=Compiler encountered an internal error}
[E 10/03/22 09:41:07.579 328128] [kernel_manager.cpp:CompiledMtlKernelBase@80] Assertion failure: pipeline_state_ != nullptr


Traceback (most recent call last):
  File "/Users/pgzxb/taichi_test/t.py", line 13, in <module>
    kernel4(2, 2)
  File "/Users/pgzxb/taichi/python/taichi/lang/kernel_impl.py", line 915, in wrapped
    return primal(*args, **kwargs)
  File "/Users/pgzxb/taichi/python/taichi/lang/kernel_impl.py", line 842, in __call__
    return self.runtime.compiled_functions[key](*args)
  File "/Users/pgzxb/taichi/python/taichi/lang/kernel_impl.py", line 769, in func__
    raise e from None
  File "/Users/pgzxb/taichi/python/taichi/lang/kernel_impl.py", line 766, in func__
    t_kernel(launch_ctx)
RuntimeError: [kernel_manager.cpp:CompiledMtlKernelBase@80] Assertion failure: pipeline_state_ != nullptr
@PGZXB PGZXB added the potential bug Something that looks like a bug but not yet confirmed label Oct 2, 2022
@taichi-gardener taichi-gardener moved this to Untriaged in Taichi Lang Oct 2, 2022
@PGZXB PGZXB changed the title Assert failed on metal Assertion failure: pipeline_state_ != nullptr on metal Oct 2, 2022
@PGZXB PGZXB closed this as completed Oct 6, 2022
Repository owner moved this from Untriaged to Done in Taichi Lang Oct 6, 2022
@ailzhang
Copy link
Contributor

ailzhang commented Oct 18, 2022

Reopening since I'm see exactly the same issue on my M1 MacBook. What makes it more confusing is that if I change += to *= it works perfectly fine, I suspect that this one is related to flaky tests on newer macOS systems.

@ailzhang ailzhang reopened this Oct 18, 2022
@ailzhang ailzhang added this to the v1.2.0 milestone Oct 21, 2022
@ailzhang
Copy link
Contributor

Adding this issue to v1.2.0 milestone since it affects many users on macos12, but it's indeed a bit hard to fix so I cannot guarantee a proper fix before that.

@ailzhang
Copy link
Contributor

FYI @jim19930609 and @k-ye helped investigate this issue and here're a few observations with another (maybe simpler) repro:

import taichi as ti


@ti.kernel
def kernel4(l: ti.i32) -> ti.i32:
    res = 1
    for i in range(2):
        res1 = 0
        for j in range(l):
            res1 += j
        res += res1
    return res


ti.init(arch=ti.metal,
        make_thread_local=False,
        print_kernel_llvm_ir=True,
        log_level=ti.TRACE)
res = kernel4(3)
print(res)
  • This error exists for all m1 MacBooks with macos12 but works find on all macos11, with exactly the same metal source code.
  • It errors even when make_thread_local is off.
  • If you change res+=j to res+=1 or making the second for loop a constant range for, it passes.
  • Manually compiling the generated metal source on my Mac passes.
  • More detailed crash log can be found in Console app on Mac.

A few things worth trying:

@ailzhang ailzhang removed this from the v1.2.0 milestone Oct 27, 2022
@ailzhang ailzhang added bug We've confirmed that this is an BUG metal Metal backend and removed potential bug Something that looks like a bug but not yet confirmed labels Oct 27, 2022
@turbo0628 turbo0628 self-assigned this Nov 1, 2022
@ailzhang
Copy link
Contributor

@bobcao3 do you happen to have any insights on this issue? It's blocking us from upgrading mac M1 machine to macos12 and it's becoming a bottleneck in CI. Thanks a lot!

@turbo0628
Copy link
Member

turbo0628 commented Nov 21, 2022

Some follow-up investigation:

This metal code snippet cannot work on macOS 12.x

  int32_t tmp7(0);
  int32_t tmp10 = *((device int32_t*) (ctx_addr + 4));
  for (int32_t tmp12_ = 0; tmp12_ < tmp10; tmp12_ = tmp12_ + 1) {
    tmp7 = tmp7 + tmp12_;
  }

The crash point seems to be inside an LLVM pass. I guess it crashes when attempting to automatically apply atomic add operation a top of the loop index tmp12_.

@k-ye
Copy link
Member

k-ye commented Nov 21, 2022

I guess it crashes when attempting to automatically apply atomic add operation a top of the loop index tmp12_.

Out of curiosity, where is the "atomic add" in the above code snippet?

@turbo0628
Copy link
Member

turbo0628 commented Nov 21, 2022

It's my guess that the Metal compiler is trying to apply automatic atomic add optimization, it might crash on other passes tho

@turbo0628
Copy link
Member

turbo0628 commented Nov 21, 2022

I've worked out a C++ reprod, code repository. This is likely a macOS compiler bug.

It requires following conditions to reproduce the bug:

  1. i32 data type, does not reproduce with f32
  2. Increment some value with respect to the inner loop index, which has dynamic loop range loaded from buffer. I haven't tested if it works with pass-in constant loop range.
  3. res = res + j * 3 triggers the bug, but res = res * 3 + j doesn't.

@feisuzhu
Copy link
Contributor

feisuzhu commented Dec 30, 2022

After some tinkering I found that if we annotate tmp7 to be volatile it would not trigger the bug on Ventura (13.1).

  volatile int32_t tmp7(0);
  int32_t tmp10 = *((device int32_t*) (ctx_addr + 4));
  for (int32_t tmp12_ = 0; tmp12_ < tmp10; tmp12_ = tmp12_ + 1) {
    tmp7 = tmp7 + tmp12_;
  }

Didn't test its performance though...

@PENGUINLIONG
Copy link
Member

For this second repro I found this numerically equivalent implementation working:

import taichi as ti

@ti.kernel
def kernel4(l: ti.i32) -> ti.i32:
    res = 1
    for i in range(2):
        res += (l - 1) * l // 2
    return res

ti.init(arch=ti.metal,
        print_ir=True,
        log_level=ti.TRACE)
res = kernel4(3)
print(res)

I guess the LLVM pass to transform summing loops into gaussian sum formula is broken in Apple's MSL compilation pipeline. We can implement it ourselves to workaround this issue.

ailzhang added a commit that referenced this issue Jan 18, 2023
Issue: #6221 

### Brief Summary

`kernel4` in `test_offline_cache.py` triggers a weird LLVM internal
error on metal backend for macos12+. After moving metal runtime to
gfxruntime by @PENGUINLIONG and #7201, this is now the only blocker for
us to run full CI on macos ventura. As discussed with @PENGUINLIONG and
@feisuzhu offline, let's disable this test to unblock CI and add it back
once the issue is fixed.
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
…dev#7154)

Issue: taichi-dev#6221 

### Brief Summary

`kernel4` in `test_offline_cache.py` triggers a weird LLVM internal
error on metal backend for macos12+. After moving metal runtime to
gfxruntime by @PENGUINLIONG and taichi-dev#7201, this is now the only blocker for
us to run full CI on macos ventura. As discussed with @PENGUINLIONG and
@feisuzhu offline, let's disable this test to unblock CI and add it back
once the issue is fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug We've confirmed that this is an BUG metal Metal backend
Projects
Status: Done
Development

No branches or pull requests

6 participants