Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access out-of-bound checking on CPU backends (Stage 2) #582

Merged
merged 2 commits into from
Mar 12, 2020

Conversation

xumingkuan
Copy link
Contributor

Related issue id = #561

A continuation of #572

@xumingkuan
Copy link
Contributor Author

It still gives me more than one assertion messages:

[E 03/11/20 01:47:54.008] [program.cpp:taichi::lang::assert_failed_host@24] Assertion failure: Accessing Tensor of Size [1200, 100] with indices (28, 100)


[E 03/11/20 01:47:54.008] [program.cpp:taichi::lang::assert_failed_host@24] Assertion failure: Accessing Tensor of Size [1200, 100] with indices (4, 100)

if I revert 3498767 to test it.

I think it is not essential to lock host_vsnprintf. What we need is to lock the buffer from the beginning of the write (host_vsnprintf) to the end of the read (call("taichi_assert", get_context(), stmt->cond->value, text);).

I tried std::string to obtain a copy of the string in the buffer (moving the code out of extern "C") but failed for now.

Or we could allocate a buffer for each call of get_assert_msg. I don't know if it would be more efficient than the lock.

@yuanming-hu
Copy link
Member

Since the kernels are multithreaded, it's normal if you are getting #messages that is <= #cores. As long as there're no false positives we are good.

@archibate
Copy link
Collaborator

archibate commented Mar 11, 2020

How about to allocate a buffer for each core? eg.:

struct PerCPU {
  char buffer[233];
  ... // some other resources
};

Very common approach in milti-processor OS kernels.

@yuanming-hu
Copy link
Member

I think it is not essential to lock host_vsnprintf. What we need is to lock the buffer from the beginning of the write (host_vsnprintf) to the end of the read (call("taichi_assert", get_context(), stmt->cond->value, text);).

You can simply new a buffer in get_assert_msg and then delete the result after calling taichi_assert.

@yuanming-hu
Copy link
Member

How about to allocate a buffer for each core? eg.:

struct PerCPU {
  ...
  char buffer[233];
};

Very common approach in milti-processor OS kernels.

Thread-local storage is a good idea but we need more infrastructure before we actually implement this.

@xumingkuan
Copy link
Contributor Author

I think it is not essential to lock host_vsnprintf. What we need is to lock the buffer from the beginning of the write (host_vsnprintf) to the end of the read (call("taichi_assert", get_context(), stmt->cond->value, text);).

You can simply new a buffer in get_assert_msg and then delete the result after calling taichi_assert.

I think I'll choose this solution. As long as get_assert_msg is not used in other places, there would be no memory leak issues even if we use C-style new's.

@xumingkuan
Copy link
Contributor Author

Oh, it may be better to just implement another taichi_assert with arguments instead...

Then we can safely use a lock on a global buffer or delete the buffer inside the function.

@yuanming-hu
Copy link
Member

Oh, it may be better to just implement another taichi_assert with arguments instead...

Then we can safely use a lock on a global buffer or delete the buffer inside the function.

Let's just take the simplest solution new/delete for now. After this bound checking feature is effectively in use, we'll learn more about its behavior and practical need. Then we have enough information to engineer a more complex one.

@yuanming-hu
Copy link
Member

yuanming-hu commented Mar 11, 2020

Actually, as you mentioned, even if we have multiple threads, we only need one error message. This can be done by having something like

{
  char error_message_buffer[2048];
  i32 error_code;
  i32 lock;
}

The first thread with assertion error just set error_code to non-zero, and then exit. Later if other threads that read a non-zero error code will just exit silently.

Then the host will check the error code, and if non-zero, halt the program with the error message.

This will allow the host to throw exceptions instead of worker threads in the thread queue, leading to a more testable design.

@yuanming-hu yuanming-hu reopened this Mar 11, 2020
@xumingkuan
Copy link
Contributor Author

By combining get_assert_msg and taichi_assert into taichi_assert_format, I get only one assertion message now. Maybe it's not good for modular programming though...

@yuanming-hu
Copy link
Member

Looks good! There are many design choices here. Let's briefly skype chat if you are available.

@xumingkuan xumingkuan marked this pull request as ready for review March 11, 2020 20:15
@yuanming-hu
Copy link
Member

yuanming-hu commented Mar 11, 2020

After this line

program.sync = (program.sync && arch_is_cpu(arch));

call Program::check_runtime_error if in debug mode and on CPU.

in Program::check_runtime_error, fetch the error code and error message ptr, just like

and
llvm_runtime = runtime->fetch_result<void *>();

@yuanming-hu
Copy link
Member

yuanming-hu commented Mar 11, 2020

For fetch_result:

runtime->set_result(runtime);

@yuanming-hu
Copy link
Member

Hi @xumingkuan, do you want me to merge this in for now, and you open a new PR for the enhanced assertion failure treatment? (Just trying to avoid large changesets...)

@xumingkuan
Copy link
Contributor Author

Let's merge this in for now. I found that shopping and cooking take some time, so it may take a longer time than I thought for me to finish the enhanced assertion failure treatment.

@yuanming-hu
Copy link
Member

Sounds good.

@yuanming-hu yuanming-hu merged commit db0cd44 into taichi-dev:master Mar 12, 2020
@xumingkuan xumingkuan deleted the lock2 branch March 18, 2020 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants