asan: problem calling NVIDIA CUDA libraries #629

maleadt · 2015-12-02T16:25:17Z

I'm having an issue using ASAN with the NVIDIA CUDA libraries on a x86_64 system, returning bogus when calling cuInit. Given this small test-case (compiled with clang -lcuda):

#include <cuda.h>
#include <stdio.h>

int main() {
    printf("cuInit(0) = %d\n", cuInit(0));
    return 0;
}

When using -fsanitize=address, this call returns 2 (an impossible return value for this call) instead of the expected 0. I tried tweaking almost every ASAN option, to no avail.

A verbose log:

$ ASAN_OPTIONS=verbosity=4,debug=1 ./cuinit_asan
==9400==Parsed ASAN_OPTIONS: verbosity=4,debug=1
==9400==AddressSanitizer: failed to intercept '__isoc99_printf'
==9400==AddressSanitizer: failed to intercept '__isoc99_sprintf'
==9400==AddressSanitizer: failed to intercept '__isoc99_snprintf'
==9400==AddressSanitizer: failed to intercept '__isoc99_fprintf'
==9400==AddressSanitizer: failed to intercept '__isoc99_vprintf'
==9400==AddressSanitizer: failed to intercept '__isoc99_vsprintf'
==9400==AddressSanitizer: failed to intercept '__isoc99_vsnprintf'
==9400==AddressSanitizer: failed to intercept '__isoc99_vfprintf'
==9400==AddressSanitizer: libc interceptors initialized
|| `[0x10007fff8000, 0x7fffffffffff]` || HighMem    ||
|| `[0x02008fff7000, 0x10007fff7fff]` || HighShadow ||
|| `[0x00008fff7000, 0x02008fff6fff]` || ShadowGap  ||
|| `[0x00007fff8000, 0x00008fff6fff]` || LowShadow  ||
|| `[0x000000000000, 0x00007fff7fff]` || LowMem     ||
MemToShadow(shadow): 0x00008fff7000 0x000091ff6dff 0x004091ff6e00 0x02008fff6fff
redzone=16
max_redzone=2048
quarantine_size=256M
malloc_context_size=30
SHADOW_SCALE: 3
SHADOW_GRANULARITY: 8
SHADOW_OFFSET: 7fff8000
==9400==Installed the sigaction for signal 11
==9400==SetCurrentThread: 0x7fe10a946000 for thread 0x7fe10a940780
==9400==T0: stack [0x7fff9d022000,0x7fff9d822000) size 0x800000; local=0x7fff9d8219c0
==9400==AddressSanitizer Init done
cuInit(0) = 2
__tls_get_addr: DTLS_Resize 0x7fe10a940760 0
__tls_get_addr: 0x7fe108c81d90 {0x000000000002,0x000000000000} => 0x7fe10a9406a8; tls_beg: 0x7fe10a9406a8; sp: 0x7fff9d821980 num_live_dtls 1
__tls_get_addr: Can't guess glibc version
==9401==Attached to thread 9400.
==9401==Detached from thread 9400.

System specs: clang 3.5.0 from Debian repositories, on a x86_64 system running linux 3.16.0 with NVIDIA drivers 340.65 (corresponding with CUDA 6.0). On an otherwise identical 32-bit system, the error does not occur.
Also tried with llvm/clang/compiler-rt from svn trunk, same issue.

The text was updated successfully, but these errors were encountered:

kcc · 2015-12-02T16:34:51Z

I've recently added a flag protect_shadow_gap exactly for this purpose.
cuInit() is trying to mmap a large chunk of address space which intersects with the mapping
that asan makes to protect its "shadow gap".
With ASAN_OPTIONS=protect_shadow_gap=0 cuInit() passes for me, as well as a tiny test using cuda (C++ is built with clang+asan, cuda code is built with clang, see http://llvm.org/docs/CompileCudaWithLLVM.html).
I have not tried anything more complex.

Note that the cuda driver is closed source and no one know what it does, why it maps a large chunk at a fixed address and what will happen with that mapping at run-time. So, you are in a warranty void zone :(

maleadt · 2015-12-02T16:50:03Z

What an (awesome) coincidence; thanks for the swift reply! My test case now works, I'll be testing it out on my more complex CUDA back-end soon. Let's see whether hell really breaks loose 😄

kcc · 2015-12-02T21:54:42Z

FTR, the allocations done in cuInit() are probably related to CUDA's unified-virtual-address-space.
As soon as you call cudaMallocHost() and use that pointer in asan-instrumented code -- you are in trouble. Since I am not a CUDA expert myself I can't (easily) make a test case for that, so any help here is welcome.

maleadt · 2015-12-03T12:46:58Z

As you suspected, disabling protect_shadow_gap isn't enough; even though it makes cuInit succeed other API calls fail. For example, cuCtxCreate_v2 crashes in the memset interceptor, complaining that Address 0x000200000000 is located in the shadow gap area. I'll see about making a small test-case.

However, it seems that the CUDA allocations are very much fixed: always 0x200000000-0xd00000000. I tried altering the shadow offset (kDefaultShort64bitShadowOffset) to 0xd00000000, or back to 1ULL << 44 which seemed like the previous behavior, but that only led to segfaults when compiling more complex code. Maybe I'm doing something wrong, changing the offset should work right?

Seeing how there's prelink 'detection', I also tried pre-occupying the CUDA range (from the constructor of a LD_PRELOAD library), which forced ASAN to split the shadow gap (/proc/self/map):

7fff7000-8fff7000 rw-p 00000000 00:11 6387124                            /dev/shm/13891 [low shadow] (deleted)
8fff7000-200000000 ---p 00000000 00:11 6387126                           /dev/shm/13891 [shadow gap] (deleted)
200000000-d00000000 ---p 00000000 00:00 0                                CUDA blob
d00000000-2008fff7000 ---p c70009000 00:11 6387126                       /dev/shm/13891 [shadow gap] (deleted)
2008fff7000-10007fef8000 rw-p 00000000 00:11 6387125                     /dev/shm/13891 [high shadow] (deleted)
10007fef8000-10007fff7000 rw-p 00000000 00:00 0 
10007fff7000-10007fff8000 rw-p dfff0000000 00:11 6387125                 /dev/shm/13891 [high shadow] (deleted)

Then I hooked cuInit to unmap the memory before performing the actual call. However, the memset interceptor still complains about 0x000200000000 being part of the shadow gap. Is this expected? Although the mapping is clearly split, what ASAN prints at verbosity=4 still seems like the default mapping:

|| `[0x10007fff8000, 0x7fffffffffff]` || HighMem    ||
|| `[0x02008fff7000, 0x10007fff7fff]` || HighShadow ||
|| `[0x00008fff7000, 0x02008fff6fff]` || ShadowGap  ||
|| `[0x00007fff8000, 0x00008fff6fff]` || LowShadow  ||
|| `[0x000000000000, 0x00007fff7fff]` || LowMem     ||

Any pointers? Thanks!

kcc · 2015-12-03T17:01:24Z

Try ASAN_OPTIONS=protect_shadow_gap=0:replace_intrin=0:detect_leaks=0.
I'll update this bug with a test a bit later.

kcc · 2015-12-03T19:32:45Z

Here is my test (derived from http://llvm.org/docs/CompileCudaWithLLVM.html):

axpy.cu:

#include <helper_cuda.h> // for checkCudaErrors
#include <iostream>
__global__ void axpy(float a, float* x, float* y) {
  y[threadIdx.x] = a * x[threadIdx.x];
}
void test_axpy() {
  const int kDataLen = 4;
  float a = 2.0f;
  float host_x[kDataLen] = {1.0f, 2.0f, 3.0f, 4.0f};
  float host_y[kDataLen];
  float* device_x;
  float* device_y;
  checkCudaErrors(cudaMalloc(&device_x, kDataLen * sizeof(float)));
  checkCudaErrors(cudaMalloc(&device_y, kDataLen * sizeof(float)));
  checkCudaErrors(cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),
                             cudaMemcpyHostToDevice));
  axpy<<<1, kDataLen>>>(a, device_x, device_y);
  checkCudaErrors(cudaDeviceSynchronize());
  checkCudaErrors(cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),
                             cudaMemcpyDeviceToHost));
  for (int i = 0; i < kDataLen; ++i) {
    std::cout << "y[" << i << "] = " << host_y[i] << "\n";
  }
  checkCudaErrors(cudaDeviceReset());
}

axpy_main.cc:

void test_axpy();
int main() { test_axpy(); }

Build:

clang --cuda-path=cuda -I cuda/samples/common/inc/ axpy.cu -c
clang++ axpy_main.cc axpy.o -lcudart_static -lcuda -ldl -lrt -pthread  -L /usr/lib/x86_64-linux-gnu/ -L cuda/lib64 -fsanitize=address

Run:

ASAN_OPTIONS=protect_shadow_gap=0:replace_intrin=0:detect_leaks=0 ./a.out

W/o protect_shadow_gap=0 cuInit() returns 2

W/o replace_intrin=0 (which disables checks inside memset interceptor) I get this:

==17300==ERROR: AddressSanitizer: unknown-crash on address 0x000200000000 at pc 0x00000049b6c6 bp 0x7fffa85cb210 sp 0x7fffa85ca9c8
WRITE of size 65536 at 0x000200000000 thread T0
    #0 0x49b6c5 in __interceptor_memset
    #1 0x7fb0c82e6813  (/usr/lib/x86_64-linux-gnu/libcuda.so.1+0x225813)
    #2 0x7fb0c82e6ee7  (/usr/lib/x86_64-linux-gnu/libcuda.so.1+0x225ee7)
    #3 0x7fb0c82e761d  (/usr/lib/x86_64-linux-gnu/libcuda.so.1+0x22661d)
    #4 0x7fb0c8241bc4  (/usr/lib/x86_64-linux-gnu/libcuda.so.1+0x180bc4)
    #5 0x7fb0c83d7a4e  (/usr/lib/x86_64-linux-gnu/libcuda.so.1+0x316a4e)
    #6 0x7fb0c822761b  (/usr/lib/x86_64-linux-gnu/libcuda.so.1+0x16661b)
    #7 0x7fb0c81f325a in cuDevicePrimaryCtxRetain (/usr/lib/x86_64-linux-gnu/libcuda.so.1+0x13225a)
...
Address 0x000200000000 is located in the shadow gap area.

This failure should be easy to fix, but for now let's use replace_intrin=0

W/o detect_leaks=0 I get this at the end:

==17352==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 4904 byte(s) in 1 object(s) allocated from:
    #0 0x4b1154 in calloc 
    #1 0x7fe8667fd4a2  (/usr/lib/x86_64-linux-gnu/libcuda.so.1+0x2264a2)
    #2 0x7fe866757bc4  (/usr/lib/x86_64-linux-gnu/libcuda.so.1+0x180bc4)
    #3 0x7fe8668eda4e  (/usr/lib/x86_64-linux-gnu/libcuda.so.1+0x316a4e)
    #4 0x7fe86673d61b  (/usr/lib/x86_64-linux-gnu/libcuda.so.1+0x16661b)
    #5 0x7fe86670925a in cuDevicePrimaryCtxRetain

Now I'd like to have a test that does something more interesting and fails somewhere else.

maleadt · 2015-12-04T07:56:55Z

Concerning your test-case, I'm seeing mostly identical behaviour, except for the case without detect_leaks which works properly at my end. Probably some behavioural change in libcuda; I'm using v340.65.
Even my tests now work properly, as soon as I disable replace_intrin. However, like your testcase my host code (ie. the code invoking libcuda) isn't ASAN instrumented yet, so maybe new errors could pop up if that's the case. I'll keep my eyes open for new types of errors and post them here.

That said, shouldn't my two attempted workarounds have succeeded (changing the shadow offset, and forcing a split shadow allocation by mmapping the memory)? I'm willing to have a closer look at both, if it's unexpected behaviour of course.

FTR, it seems like device pointers returned by cudaMalloc and the like overlap with the large blob, which is allocated with PROT_NONE, so maybe NVIDIA is just doing this to catch using device pointers in host code.

kcc · 2015-12-04T17:01:57Z

My libcuda is 346.96. Looks like they've introduced a leak :)

Instrumenting the host (GPU) code with asan is most likely not going to work today,
at least for me adding -fsanitize-address to compile command of .cu file crashes the compiler.
If you want to track that part, file a bug at llvm.org/bugs, but I am pretty sure nobody will look into
it for at least several months -- the entire cuda support is too new.

Your attempted workarounds might have worked, need to look closer why they didn't.
But it's unlikely we'll be able to accept any such changes.

Apparently, the large blob at 0x200000000 is used by cuda's allocator as an arena.
The main question: is this memory going to be used by the CPU code, or only by the GPU code.
If this memory is commonly used by CPU code (which will break asan), I'd like to have a realistic example.

bchretien · 2016-10-07T09:33:07Z

This may be fixed in the next clang release, or at least there are LLVM contributors interested in this issue: https://reviews.llvm.org/D24640 (sanitizer skipped on NVPTX)

kcc · 2016-10-07T16:30:15Z

I've also made ASAN_OPTIONS=protect_shadow_gap=0 be more friendly to cuda
and tests work in my environment. But YMMV.

jeremysalwen · 2017-08-12T06:04:48Z

I am not getting any errors with kcc's test program/options, but the following short test program is giving a SEGV with the same options:

#include <cuda_runtime.h>
#include <iostream>

int main() {
  void *  host_data;
  cudaError_t status = cudaMalloc(&host_data, 3000);
  std::cout << "status " << status <<"\n";
  std::cout << "value "<<((int*)host_data)[0]<<"\n";
}

Compilation flags:

clang++-3.8 -fsanitize=address axpy_main.cc -lcudart_static -lcuda -ldl -lrt -pthread  -L /usr/lib/x86_64-linux-gnu/ -I /usr/local/cuda-8.0/include/ -L /usr/local/cuda-8.0/lib64/

Error:

ASAN:DEADLYSIGNAL
=================================================================
==9844==ERROR: AddressSanitizer: SEGV on unknown address 0x0004e1470000 (pc 0x0000004f61c5 bp 0x7ffdbfe3c780 sp 0x7ffdbfe3c6a0 T0)
    #0 0x4f61c4  (/tmp/a.out+0x4f61c4)
    #1 0x7f1e8c1b0f44  (/lib/x86_64-linux-gnu/libc.so.6+0x21f44)
    #2 0x41ae47  (/tmp/a.out+0x41ae47)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/tmp/a.out+0x4f61c4)
==9844==ABORTING
status 0

morehouse · 2018-06-07T21:25:24Z

Has there been any progress on this front?

yegord · 2018-08-07T13:08:57Z

I am not getting any errors with kcc's test program/options, but the following short test program is giving a SEGV with the same options:

I think you actually wanted to call cudaMallocHost, not cudaMalloc. After I replaced the latter with the former, your example ran fine with both clang-3.9 and clang 6.0.

Has there been any progress on this front?

I personally managed to successfully run under asan a binary computing a heavy TesnorFlow graph on a GPU using clang-6.0 and ASAN_OPTIONS=protect_shadow_gap=0:replace_intrin=0:detect_leaks=0. So, I would say that things rather work than they do not.

mustafaahmedhussien · 2018-11-14T14:36:44Z

Hello,

Why don't asan provide an environment variable or a flag that isn't a compile time that would let the end user decide where the shadow memory shall be?

kcc · 2018-11-14T18:40:48Z

There is now a compile-time flag to enable "dynamic shadow": -mllvm -asan-force-dynamic-shadow=true
It will cause the compiler to emit code that doesn't depend on the fixed shadow location,
and then the run-time will allocate the shadow whenever it can. There is no env var that tells the run-time where to allocate -- no one has asked before.

The default mode (on Linux) still uses the fixed shadow offset 0x7fff8000 so that
the instrumentation doesn't use an extra register (as a performance and code size optimization)

mustafaahmedhussien · 2018-11-15T12:42:16Z

The driver I am using it doing an mmap of 500 MBytes at 0x3fffff000. This hits the default shadow memory. I tried using the force dynamic shadow flag with clang 6 but it doesn't work. Shadow memory is still at 0x7fff8000.

My idea is that if I can move the shadow memory outside my mmap address it would be safer than disabling the shadow memory protection.

intractabilis · 2021-09-07T20:03:01Z

Note that the cuda driver is closed source and no one know what it does, why it maps a large chunk at a fixed address and what will happen with that mapping at run-time.

CUDA has its own memory allocation routines, so I would assume it's for its own heap.

We already do this in ont_core_cpp since it was more of an issue when talking to the CUDA API, but since torch talks to CUDA for us it gets to decide how to handle the issue and it looks like it just prints a warning and continues on assuming no CUDA support. This is fine for end users, but for automated testing it means we were skipping tests without realising. See google/sanitizers#629 for more details.

maleadt closed this as completed Dec 2, 2015

kcc reopened this Dec 3, 2015

tmcdonell mentioned this issue Jul 15, 2016

GHC-8 problems tmcdonell/cuda#39

Closed

emfomy mentioned this issue Sep 7, 2017

GPU initialization crashed in debug mode. emfomy/isvd#25

Closed

Char-Aznable mentioned this issue Jun 4, 2018

clang 6.0 + cuda debug out-of-memory test failure kokkos/kokkos#1521

Closed

denis0x0D mentioned this issue Nov 10, 2018

[PASS] InstrumentBoundCheckers pass apache/tvm#2079

Merged

bhack mentioned this issue Nov 26, 2018

[Docker] Upgraded images to Ubuntu 18.04 LTS. tensorflow/tensorflow#23066

Merged

cliffwoolley mentioned this issue Dec 11, 2018

Add ASAN support NVIDIA/DALI#362

Merged

obilaniu mentioned this issue Aug 11, 2021

Disable asan memory leak checks on a Cuda test. mesonbuild/meson#9098

Merged

Jacobfaib mentioned this issue May 25, 2023

AddressSanitizer: SEGV src/util/yaksu_handle_pool.c:181 in yaksu_handle_pool_elem_get() pmodels/yaksa#245

Open

adam-smnk mentioned this issue May 31, 2023

Initial GPU support plaidml/tpp-mlir#547

Merged

naoyam mentioned this issue Sep 23, 2023

Add an option to build with asan. NVIDIA/Fuser#933

Merged

ivankochin mentioned this issue Dec 5, 2023

BUILD/CONFIG: Introduce Address Sanitizer flag option openucx/ucx#9519

Merged

4 tasks

jrwrigh mentioned this issue Oct 6, 2024

fix(jit): OOB array access in CeedNormalizePath CEED/libCEED#1683

Merged

IvanaGyro mentioned this issue Dec 25, 2024

Build CI CD for gpu. Cytnx-dev/Cytnx#538

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asan: problem calling NVIDIA CUDA libraries #629

asan: problem calling NVIDIA CUDA libraries #629

maleadt commented Dec 2, 2015

kcc commented Dec 2, 2015

maleadt commented Dec 2, 2015

kcc commented Dec 2, 2015

maleadt commented Dec 3, 2015

kcc commented Dec 3, 2015

kcc commented Dec 3, 2015

maleadt commented Dec 4, 2015

kcc commented Dec 4, 2015

bchretien commented Oct 7, 2016 •

edited

Loading

kcc commented Oct 7, 2016

jeremysalwen commented Aug 12, 2017

morehouse commented Jun 7, 2018

yegord commented Aug 7, 2018

mustafaahmedhussien commented Nov 14, 2018

kcc commented Nov 14, 2018

mustafaahmedhussien commented Nov 15, 2018

intractabilis commented Sep 7, 2021

asan: problem calling NVIDIA CUDA libraries #629

asan: problem calling NVIDIA CUDA libraries #629

Comments

maleadt commented Dec 2, 2015

kcc commented Dec 2, 2015

maleadt commented Dec 2, 2015

kcc commented Dec 2, 2015

maleadt commented Dec 3, 2015

kcc commented Dec 3, 2015

kcc commented Dec 3, 2015

maleadt commented Dec 4, 2015

kcc commented Dec 4, 2015

bchretien commented Oct 7, 2016 • edited Loading

kcc commented Oct 7, 2016

jeremysalwen commented Aug 12, 2017

morehouse commented Jun 7, 2018

yegord commented Aug 7, 2018

mustafaahmedhussien commented Nov 14, 2018

kcc commented Nov 14, 2018

mustafaahmedhussien commented Nov 15, 2018

intractabilis commented Sep 7, 2021

bchretien commented Oct 7, 2016 •

edited

Loading