Access out-of-bound checking on CPU backends (Stage 2) #582

xumingkuan · 2020-03-11T05:55:48Z

Related issue id = #561

A continuation of #572

xumingkuan · 2020-03-11T06:07:42Z

It still gives me more than one assertion messages:

[E 03/11/20 01:47:54.008] [program.cpp:taichi::lang::assert_failed_host@24] Assertion failure: Accessing Tensor of Size [1200, 100] with indices (28, 100)


[E 03/11/20 01:47:54.008] [program.cpp:taichi::lang::assert_failed_host@24] Assertion failure: Accessing Tensor of Size [1200, 100] with indices (4, 100)

if I revert 3498767 to test it.

I think it is not essential to lock host_vsnprintf. What we need is to lock the buffer from the beginning of the write (host_vsnprintf) to the end of the read (call("taichi_assert", get_context(), stmt->cond->value, text);).

I tried std::string to obtain a copy of the string in the buffer (moving the code out of extern "C") but failed for now.

Or we could allocate a buffer for each call of get_assert_msg. I don't know if it would be more efficient than the lock.

yuanming-hu · 2020-03-11T14:20:42Z

Since the kernels are multithreaded, it's normal if you are getting #messages that is <= #cores. As long as there're no false positives we are good.

archibate · 2020-03-11T15:23:21Z

How about to allocate a buffer for each core? eg.:

struct PerCPU {
  char buffer[233];
  ... // some other resources
};

Very common approach in milti-processor OS kernels.

yuanming-hu · 2020-03-11T15:25:37Z

I think it is not essential to lock host_vsnprintf. What we need is to lock the buffer from the beginning of the write (host_vsnprintf) to the end of the read (call("taichi_assert", get_context(), stmt->cond->value, text);).

You can simply new a buffer in get_assert_msg and then delete the result after calling taichi_assert.

yuanming-hu · 2020-03-11T15:26:30Z

How about to allocate a buffer for each core? eg.:
struct PerCPU {
  ...
  char buffer[233];
};
Very common approach in milti-processor OS kernels.

Thread-local storage is a good idea but we need more infrastructure before we actually implement this.

xumingkuan · 2020-03-11T19:28:53Z

I think it is not essential to lock host_vsnprintf. What we need is to lock the buffer from the beginning of the write (host_vsnprintf) to the end of the read (call("taichi_assert", get_context(), stmt->cond->value, text);).

You can simply new a buffer in get_assert_msg and then delete the result after calling taichi_assert.

I think I'll choose this solution. As long as get_assert_msg is not used in other places, there would be no memory leak issues even if we use C-style new's.

xumingkuan · 2020-03-11T19:35:42Z

Oh, it may be better to just implement another taichi_assert with arguments instead...

Then we can safely use a lock on a global buffer or delete the buffer inside the function.

yuanming-hu · 2020-03-11T19:52:44Z

Oh, it may be better to just implement another taichi_assert with arguments instead...

Then we can safely use a lock on a global buffer or delete the buffer inside the function.

Let's just take the simplest solution new/delete for now. After this bound checking feature is effectively in use, we'll learn more about its behavior and practical need. Then we have enough information to engineer a more complex one.

yuanming-hu · 2020-03-11T19:58:33Z

Actually, as you mentioned, even if we have multiple threads, we only need one error message. This can be done by having something like

{
  char error_message_buffer[2048];
  i32 error_code;
  i32 lock;
}

The first thread with assertion error just set error_code to non-zero, and then exit. Later if other threads that read a non-zero error code will just exit silently.

Then the host will check the error code, and if non-zero, halt the program with the error message.

This will allow the host to throw exceptions instead of worker threads in the thread queue, leading to a more testable design.

xumingkuan · 2020-03-11T20:03:16Z

By combining get_assert_msg and taichi_assert into taichi_assert_format, I get only one assertion message now. Maybe it's not good for modular programming though...

yuanming-hu · 2020-03-11T20:04:31Z

Looks good! There are many design choices here. Let's briefly skype chat if you are available.

yuanming-hu · 2020-03-11T20:43:10Z

After this line

taichi/taichi/program/kernel.cpp

Line 46 in a3df422

program.sync = (program.sync && arch_is_cpu(arch));

call Program::check_runtime_error if in debug mode and on CPU.

in Program::check_runtime_error, fetch the error code and error message ptr, just like

taichi/taichi/program/program.cpp

Line 158 in 3498767

runtime

and

taichi/taichi/program/program.cpp

Line 166 in 3498767

llvm_runtime = runtime->fetch_result<void *>();

yuanming-hu · 2020-03-11T20:47:57Z

For fetch_result:

taichi/taichi/runtime/llvm/runtime.cpp

Line 735 in 3498767

runtime->set_result(runtime);

yuanming-hu · 2020-03-12T20:06:33Z

Hi @xumingkuan, do you want me to merge this in for now, and you open a new PR for the enhanced assertion failure treatment? (Just trying to avoid large changesets...)

xumingkuan · 2020-03-12T23:27:50Z

Let's merge this in for now. I found that shopping and cooking take some time, so it may take a longer time than I thought for me to finish the enhanced assertion failure treatment.

yuanming-hu · 2020-03-12T23:28:41Z

Sounds good.

add a lock for the buffer char [2048]

2f89d93

yuanming-hu closed this Mar 11, 2020

yuanming-hu reopened this Mar 11, 2020

get the formatted assertion message and do the assertion in one function

36db1b9

xumingkuan marked this pull request as ready for review March 11, 2020 20:15

yuanming-hu merged commit db0cd44 into taichi-dev:master Mar 12, 2020

xumingkuan mentioned this pull request Mar 16, 2020

Testable Device-Side Assertion Failures on CPUs #605

Merged

xumingkuan deleted the lock2 branch March 18, 2020 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access out-of-bound checking on CPU backends (Stage 2) #582

Access out-of-bound checking on CPU backends (Stage 2) #582

xumingkuan commented Mar 11, 2020

xumingkuan commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020

archibate commented Mar 11, 2020 •

edited

Loading

yuanming-hu commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020

xumingkuan commented Mar 11, 2020

xumingkuan commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020 •

edited

Loading

xumingkuan commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020 •

edited

Loading

yuanming-hu commented Mar 11, 2020 •

edited

Loading

yuanming-hu commented Mar 12, 2020

xumingkuan commented Mar 12, 2020

yuanming-hu commented Mar 12, 2020

Access out-of-bound checking on CPU backends (Stage 2) #582

Access out-of-bound checking on CPU backends (Stage 2) #582

Conversation

xumingkuan commented Mar 11, 2020

xumingkuan commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020

archibate commented Mar 11, 2020 • edited Loading

yuanming-hu commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020

xumingkuan commented Mar 11, 2020

xumingkuan commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020 • edited Loading

xumingkuan commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020

yuanming-hu commented Mar 11, 2020 • edited Loading

yuanming-hu commented Mar 11, 2020 • edited Loading

yuanming-hu commented Mar 12, 2020

xumingkuan commented Mar 12, 2020

yuanming-hu commented Mar 12, 2020

archibate commented Mar 11, 2020 •

edited

Loading

yuanming-hu commented Mar 11, 2020 •

edited

Loading

yuanming-hu commented Mar 11, 2020 •

edited

Loading

yuanming-hu commented Mar 11, 2020 •

edited

Loading