handle Clang-CUDA #2075

fsb4000 · 2021-07-24T09:02:55Z

Fixes #1949
Fixes DevCom-1572308 / VSO-1441025 / AB#1441025

clang defines only __CUDACC__ but it doesn't define __CUDACC_VER_MAJOR__ ,__CUDACC_VER_MINOR__ ,__CUDACC_VER_BUILD__ at all.

Actually it can't pass our cuda test: https://github.com/microsoft/STL/blob/main/tests/std/tests/GH_000639_nvcc_include_all/test.compile.pass.cpp

Because clang adds the incudes: C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\Llvm\x64\lib\clang\12.0.0\include\cuda_wrappers

And they don't protect themselves from macroized "new":

STL/stl/inc/__msvc_all_public_headers.hpp

Lines 15 to 18 in bd7adb4

    
           // All STL headers should protect themselves from macroized new. 
        
           #pragma push_macro("new") 
        
           #undef new 
        
           #define new WILL NOT COMPILE

But it seems to be working:

#include <iostream>

__global__ void axpy(float a, float* x, float* y) {
  y[threadIdx.x] = a * x[threadIdx.x];
}

int main() {
  const int kDataLen = 4;

  float a = 2.0f;
  float host_x[kDataLen] = {1.0f, 2.0f, 3.0f, 4.0f};
  float host_y[kDataLen];

  // Copy input data to device.
  float* device_x;
  float* device_y;
  cudaMalloc(&device_x, kDataLen * sizeof(float));
  cudaMalloc(&device_y, kDataLen * sizeof(float));
  cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),
             cudaMemcpyHostToDevice);

  // Launch the kernel.
  axpy<<<1, kDataLen>>>(a, device_x, device_y);

  // Copy output data to host.
  cudaDeviceSynchronize();
  cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),
             cudaMemcpyDeviceToHost);

  // Print the results.
  for (int i = 0; i < kDataLen; ++i) {
    std::cout << "y[" << i << "] = " << host_y[i] << "\n";
  }

  std::cout << "_MSVC_STL_VERSION " << _MSVC_STL_VERSION  << std::endl;

  cudaDeviceReset();
}

clang++ test.cu -o test.exe --cuda-gpu-arch=sm_35 --cuda-path="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3" -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib\x64"  -lcudart -Wall -Wextra

fodinabor · 2021-09-15T15:00:43Z

The fix works-for-me ™️
Should we add the new macro guards to the internal Clang headers to enable the STL test suite to move this forward?

fsb4000 · 2021-09-15T15:29:32Z

We can't add anything to clang headers. But I could add to __msvc_all_public_headers.hpp something like this :

#if !(defined( __CUDACC__ ) && defined(__clang__))
// All STL headers should protect themselves from macroized new.
#pragma push_macro("new")
#undef new
#define new WILL NOT COMPILE
#endif // !(defined( __CUDACC__ ) && defined(__clang__))

And enable the cuda test.

I will do after the commit will be merged: #2204

fodinabor · 2021-09-15T15:32:18Z

Looks like a fine workaround, but I actually wondered, whether it would make sense that I create a PR for Clang to push/pop the new macro in Clang's new header :)
(won't be in a released Clang version before early 2022 then, though)

mojca · 2021-11-25T16:39:39Z

Any chance to move this forward?
(I just accidentally reported a duplicate bug.)
The cited PR 2204 has been merged.

I didn't run the tests, so I didn't have any issues with __msvc_all_public_headers.hpp, but I need to keep adding -D_ALLOW_COMPILER_AND_STL_VERSION_MISMATCH to compiler flags to work around the problem.

Artem-B · 2021-11-30T03:15:41Z

We can't add anything to clang headers.

We certainly can. It may not help released versions of clang, but if reasonable change must be done to clang's headers in order to make it work better for CUDA compilation on windows going forward, I'm open to the idea.

fsb4000 · 2021-12-07T05:47:05Z

We can't add anything to clang headers.

We certainly can. It may not help released versions of clang, but if reasonable change must be done to clang's headers in order to make it work better for CUDA compilation on windows going forward, I'm open to the idea.

Stephan added the PR to "Nice to have": https://github.com/microsoft/STL/projects/6#card-74026742

And about the macro "new". As far as I understand: MFC changes "new" to a macro. So for MFC and stl to work together, stl must remove this macro and then restore.

But Clang doesn't work with MFC at the moment. https://developercommunity.visualstudio.com/t/ClangCL-and-MFC-Undefined-Behaviour/1541392

And maybe zero people use CUDA + MFC...

So I'm not sure if changes at libc++ headers are necessary to distract developers from their current problems.

Artem-B · 2021-12-07T06:36:22Z

And about the macro "new". As far as I understand: MFC changes "new" to a macro. So for MFC and stl to work together, stl must remove this macro and then restore.

Adding the guard to cuda_wrappers when we're compiling on windows should be doable.

But Clang doesn't work with MFC at the moment. https://developercommunity.visualstudio.com/t/ClangCL-and-MFC-Undefined-Behaviour/1541392
And maybe zero people use CUDA + MFC

My bet is that the vast majority of CUDA code to be compiled with clang on windows initially will be TensorFlow, possibly followed by other ML toolkits. Those do care about a subset of the standard C++ library working, but not so much about MFC (unless microsoft's STL depends on it). As your example demonstrated, things do work in general. Support for the standard C++ library on the GPU is rather limited anyways cmath, complex, new, and whatever happens to be constexpr is pretty much the extent of it on the linux side. As long as roughly the same set works with MSFT's STL, I'd consider it to be good enough.

Here are the tests I'm running on LLVM's CUDA test bots: https://github.com/llvm/llvm-test-suite/tree/main/External/CUDA
Getting those to compile and work would get you most of the way to 'quite usable in practice' state.

stl/inc/yvals_core.h

StephanTLavavej · 2021-12-18T05:02:33Z

My apologies for taking so long to review this. I've pushed a comment whitespace change (plus a conflict-free merge with main since a test run is going to happen anyways). I performed an initial audit of the codebase for other Clang-CUDA interactions and filed #2430 to track that, but that doesn't need to block this focused PR from being merged.

which admits both clang-CUDA and `cl /showIncludes /D__CUDACC__` into the STL.

CaseyCarter · 2022-01-06T01:49:38Z

I'm going to add this to the next batch of changes to merge - please notify me if any further commits are pushed.

@stl: I made a change after you approved. Normally, I would wait for reapproval to merge, but since this addresses a P0 bug I'm going to merge now and we can iron out any issues after you return from vacation.

StephanTLavavej · 2022-01-06T05:30:49Z

@CaseyCarter LGTM. (BTW, I am @StephanTLavavej here.)

CaseyCarter · 2022-01-06T05:51:57Z

Thanks for ensuring that Clang could'a worked with CUDA!

When compiling the tests with the CUDA backend, we're getting errors that have been fixed in for Visual Studio 2022 here: microsoft/STL#2075

* Rewrite github action file for windows - Use windows-2019 image - Use prebuilt LLVM 15 - Build boost in CI * Add full path to clang In the Github runner, another LLVM is installed in C:\Program Files, we want to use the prebuilt one * Remove continue-on-error * Prepend path to prebuilt LLVM to PATH * Comment out code that does not compile with LLVM13+ TODO: Check if this is still necessary and how to make it work with LLVM13+ * Link against LLVMSupport on Windows Required for symbols from LLVM CommandLine library * Build in Release mode to match build type of LLVM * Add step to build CPU tests * Add missing cd command * Use correct path to hipsycl-config.cmake * Add missing quotes * Run CPU tests * Fix path * Fix typo * Use cmd instead of powershell * Add path to install_dir/bin to PATH to make dlls findable * Add missing backslash * Add steps to install CUDA * Fix typo in CUDA version * Add step to build tests with CUDA backend * Build CUDA tests in own directory * Use different gencode for CUDA tests * Fix typo in CUDA gencode * Remove 10.2 from tested CUDA versions * Only test CUDA 11.0 for now * Use correct version when caching CUDA * Switch to Windows Server 2022 as OS When compiling the tests with the CUDA backend, we're getting errors that have been fixed in for Visual Studio 2022 here: microsoft/STL#2075 * Use different CUDA gencode again * Switch from hipSYCL_DIR and HIPSYCL_TARGETS to OpenSYCL_DIR and OPENSYCL_TARGETS * Fix path to OpenSYCL.config * Add missing quotes * Remove step to build CUDA tests The prebuilt LLVM was not build with CUDA enabled * Add steps to build with prebuilt clang 11 * Enable calling `setDeviceMangleContext` for LLVM 11 and 12 In fact, this is not available in LLVM 11 (only when patched with https://reviews.llvm.org/D69322) but the prebuilt LLVM in CI is exactly a patched LLVM 11. * Remove clang11 from tested clang versions

handle Clang-CUDA

dec4105

fsb4000 requested a review from a team as a code owner July 24, 2021 09:02

StephanTLavavej added the bug Something isn't working label Jul 27, 2021

StephanTLavavej self-assigned this Aug 4, 2021

Merge branch 'main' into fix1949

38a7e39

fsb4000 added 3 commits October 8, 2021 23:22

Merge branch 'main' into fix1949

23b046d

do not require protection from macroized new for Clang in Cuda mode

e4baf10

Merge branch 'microsoft:main' into fix1949

7e9d2b2

mojca mentioned this pull request Nov 29, 2021

<yvals_core.h>: CUDA version check doesn't handle Clang-CUDA #1949

Closed

StephanTLavavej mentioned this pull request Dec 18, 2021

Consider extending Clang-CUDA support/testing #2430

Open

StephanTLavavej added 2 commits December 17, 2021 20:51

Merge branch 'main' into fix1949

bc0b9b1

Fix comment spacing.

79347e0

StephanTLavavej approved these changes Dec 18, 2021

View reviewed changes

stl/inc/yvals_core.h Outdated Show resolved Hide resolved

StephanTLavavej removed their assignment Dec 18, 2021

CaseyCarter approved these changes Dec 18, 2021

View reviewed changes

CaseyCarter self-assigned this Jan 6, 2022

Don't check NVCC version if __CUDACC_VER_MAJOR__ is not defined

fbb95f9

which admits both clang-CUDA and `cl /showIncludes /D__CUDACC__` into the STL.

StephanTLavavej approved these changes Jan 6, 2022

View reviewed changes

CaseyCarter merged commit 16251e3 into microsoft:main Jan 6, 2022

CaseyCarter removed their assignment Jan 6, 2022

fsb4000 deleted the fix1949 branch January 6, 2022 07:00

Artem-B mentioned this pull request Mar 30, 2022

Various cuda compiling problems on Windows llvm/llvm-project#54609

Closed

nilsfriess added a commit to nilsfriess/OpenSYCL that referenced this pull request Mar 17, 2023

Switch to Windows Server 2022 as OS

ac3a005

When compiling the tests with the CUDA backend, we're getting errors that have been fixed in for Visual Studio 2022 here: microsoft/STL#2075

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle Clang-CUDA #2075

handle Clang-CUDA #2075

fsb4000 commented Jul 24, 2021 •

edited by CaseyCarter

Loading

fodinabor commented Sep 15, 2021

fsb4000 commented Sep 15, 2021

fodinabor commented Sep 15, 2021 •

edited

Loading

mojca commented Nov 25, 2021

Artem-B commented Nov 30, 2021 •

edited

Loading

fsb4000 commented Dec 7, 2021 •

edited

Loading

Artem-B commented Dec 7, 2021

StephanTLavavej commented Dec 18, 2021

CaseyCarter commented Jan 6, 2022

StephanTLavavej commented Jan 6, 2022

CaseyCarter commented Jan 6, 2022

	// All STL headers should protect themselves from macroized new.
	#pragma push_macro("new")
	#undef new
	#define new WILL NOT COMPILE

handle Clang-CUDA #2075

handle Clang-CUDA #2075

Conversation

fsb4000 commented Jul 24, 2021 • edited by CaseyCarter Loading

fodinabor commented Sep 15, 2021

fsb4000 commented Sep 15, 2021

fodinabor commented Sep 15, 2021 • edited Loading

mojca commented Nov 25, 2021

Artem-B commented Nov 30, 2021 • edited Loading

fsb4000 commented Dec 7, 2021 • edited Loading

Artem-B commented Dec 7, 2021

StephanTLavavej commented Dec 18, 2021

CaseyCarter commented Jan 6, 2022

StephanTLavavej commented Jan 6, 2022

CaseyCarter commented Jan 6, 2022

fsb4000 commented Jul 24, 2021 •

edited by CaseyCarter

Loading

fodinabor commented Sep 15, 2021 •

edited

Loading

Artem-B commented Nov 30, 2021 •

edited

Loading

fsb4000 commented Dec 7, 2021 •

edited

Loading