Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TMVA/CMake] Fix CMake setup for DNN CUDA implementation #283

Closed
wants to merge 1 commit into from
Closed

[TMVA/CMake] Fix CMake setup for DNN CUDA implementation #283

wants to merge 1 commit into from

Conversation

stwunsch
Copy link
Contributor

  • Set minimum CUDA version to 8.0 (doesn't build otherwise)
  • Fix include path of CUDA dependent library dnn_cuda (previously used path ROOT_INCLUDE_DIRS is not in local scope)

@stwunsch
Copy link
Contributor Author

stwunsch commented Oct 21, 2016

It builds fine on my machine with this cmake setup (note that CUDA 8.0 supports only GCC up to 5.3):

cmake .. \
-Dalien=OFF -Dcuda=ON \
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0 \
-DCMAKE_CXX_COMPILER=/usr/bin/g++-4.9 \
-DCMAKE_C_COMPILER=/usr/bin/gcc-4.9

And here a test python file:

#!/usr/bin/env python

from ROOT import TMVA, TFile, TTree, TCut
from subprocess import call
from os.path import isfile

# Setup TMVA
TMVA.Tools.Instance()

output = TFile.Open('TMVA.root', 'RECREATE')
factory = TMVA.Factory('TMVAClassification', output,
        '!V:!Silent:Color:DrawProgressBar:Transformations=D,G:AnalysisType=Classification')

# Load data
if not isfile('tmva_class_example.root'):
    call(['curl', '-O', 'http://root.cern.ch/files/tmva_class_example.root'])

data = TFile.Open('tmva_class_example.root')
signal = data.Get('TreeS')
background = data.Get('TreeB')

dataloader = TMVA.DataLoader('dataset')
for branch in signal.GetListOfBranches():
    dataloader.AddVariable(branch.GetName())

dataloader.AddSignalTree(signal, 1.0)
dataloader.AddBackgroundTree(background, 1.0)
dataloader.PrepareTrainingAndTestTree(TCut(''),
        'TrainTestSplit_Signal=0.8:TrainTestSplit_Background=0.8:SplitMode=Random:NormMode=NumEvents:!V')

# Book methods
factory.BookMethod(dataloader, TMVA.Types.kDNN, 'DNN',
        'H:!V:VarTransform=D,G:ErrorStrategy=CROSSENTROPY:WeightInitialization=XAVIERUNIFORM:Architecture=GPU:Layout=TANH|64,TANH|64,TANH|64,LINEAR:TrainingStrategy=LearningRate=0.01,Repetitions=1,ConvergenceSteps=20,BatchSize=32,TestRepetitions=1')

# Run training, test and evaluation
factory.TrainAllMethods()
factory.TestAllMethods()
factory.EvaluateAllMethods()

@stwunsch
Copy link
Contributor Author

stwunsch commented Oct 21, 2016

@simonpf How was it possible on your machine to build the CUDA DNN method with the include path of the librarydnn_cuda set to ROOT_INCLUDE_DIRS, which is not populated by default? On my machines (Ubuntu 16.04 and Arch) it fails because the headers are not found.

@simonpf
Copy link
Contributor

simonpf commented Oct 28, 2016

Hi Stefan,
You're right about the ROOT_INCLUDE_DIRS. However, the CUDA include directories are only appended to
the standard include directories so the include directories should be set properly anyways. At least they are on my machine, as well as on the techlab machines. Or is this just a weird coincidence?

Concerning CUDA 8.0: I am using CUDA 7.5 on my machine (Ubuntu 15.10) and the techlab machine (Scientific Linux CERN SLC release 6.5) so I don't think requiring CUDA version >= 8.0 is a good idea.

@stwunsch
Copy link
Contributor Author

Regarding the include dirs, probably it's a threading problem? It fails finding header files, so the path is not properly populated on my machine(s). Nevertheless, setting the path directly solves it.

I'm running Ubuntu 16.10 LTS and it fails with CUDA 7.5 but it runs fine with 8.0. It is true that it should be possible to compile it on CERN SLC, so I'll keep investigating what is going wrong on my machine. Though, it should also run on Ubuntu 16.10 because it's used frequently (and will be used the next years because it's a LTS).

@simonpf
Copy link
Contributor

simonpf commented Oct 29, 2016

Could you maybe run make VERBOSE=1 and post the build command for the
CUDA object files? Then we can see if the root include directories
are there or not.

For CUDA 8.0: Did you try if adding the following nvcc flag
-DCUDA_NVCC_FLAGS="-D_FORCE_INLINES" or something along those lines
BVLC/caffe#4046
solves the problem?

On 28 October 2016 at 13:34, Stefan Wunsch notifications@github.com wrote:

Regarding the include dirs, probably it's a threading problem? It fails
finding header files, so the path is not properly populated on my
machine(s). Nevertheless, setting the path directly solves it.

I'm running Ubuntu 16.10 LTS and it fails with CUDA 7.5 but it runs fine
with 8.0. It is true that it should be possible to compile it on CERN SLC,
so I'll keep investigating what is going wrong on my machine. Though, it
should also run on Ubuntu 16.10 because it's used frequently (and will be
used the next years because it's a LTS).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#283 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGnOhUXVmJ99qAtrtJmrJ5SGswgGJkiEks5q4d3WgaJpZM4KdRgw
.

@stwunsch
Copy link
Contributor Author

Regarding the include problem:

Run with this CMake/make commands on ROOT master branch:

cmake .. \
-Dalien=OFF -Dcuda=ON \
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-8.0 \
-DCMAKE_CXX_COMPILER=/usr/bin/g++-4.9 \
-DCMAKE_C_COMPILER=/usr/bin/gcc-4.9

make -j10 VERBOSE=1 # Multi-thread!

Here the error:

/usr/users/wunsch/root_master/tmva/tmva/src/DNN/Architectures/Cuda/CudaMatrix.cu:16:52: fatal error: TMVA/DNN/Architectures/Cuda/CudaMatrix.h: No such file or directory
 #include "TMVA/DNN/Architectures/Cuda/CudaMatrix.h"
                                                    ^
compilation terminated.
CMake Error at dnn_cuda_generated_CudaMatrix.cu.o.cmake:209 (message):
  Error generating
  /usr/users/wunsch/root_master/build/tmva/tmva/CMakeFiles/dnn_cuda.dir/src/DNN/Architectures/Cuda/./dnn_cuda_generated_CudaMatrix.cu.o


make[2]: *** [tmva/tmva/CMakeFiles/dnn_cuda.dir/src/DNN/Architectures/Cuda/dnn_cuda_generated_CudaMatrix.cu.o] Error 1
make[1]: *** [tmva/tmva/CMakeFiles/dnn_cuda.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [all] Error 2

And now here the verbose make output (stdout grep 'nvcc'):

/usr/local/cuda-8.0/bin/nvcc -M -D__CUDACC__ /usr/users/wunsch/root_master/tmva/tmva/src/DNN/Architectures/Cuda/CudaMatrix.cu -o /usr/users/wunsch/root_master/build/tmva/tmva/CMakeFiles/dnn_cuda.dir/src/DNN/Architectures/Cuda/dnn_cuda_generated_CudaMatrix.cu.o.NVCC-depend -ccbin /usr/bin/gcc-4.9 -m64 --std c++11 -Xcompiler ,\"-pipe\",\"-m64\",\"-Wshadow\",\"-Wall\",\"-W\",\"-Woverloaded-virtual\",\"-fsigned-char\",\"-fPIC\",\"-pthread\",\"-O2\",\"-g\",\"-DNDEBUG\" -DNVCC -I/usr/local/cuda-8.0/include -I/usr/users/wunsch/root_master/build/include -I/usr/local/cuda-8.0/include

Now, if I run it with only one thread (make -j1 VERBOSE=1), everything runs fine (but the verbose output has not changed):

/usr/local/cuda-8.0/bin/nvcc -M -D__CUDACC__ /usr/users/wunsch/root_master/tmva/tmva/src/DNN/Architectures/Cuda/CudaMatrix.cu -o /usr/users/wunsch/root_master/build/tmva/tmva/CMakeFiles/dnn_cuda.dir/src/DNN/Architectures/Cuda/dnn_cuda_generated_CudaMatrix.cu.o.NVCC-depend -ccbin /usr/bin/gcc-4.9 -m64 --std c++11 -Xcompiler ,\"-pipe\",\"-m64\",\"-Wshadow\",\"-Wall\",\"-W\",\"-Woverloaded-virtual\",\"-fsigned-char\",\"-fPIC\",\"-pthread\",\"-O2\",\"-g\",\"-DNDEBUG\" -DNVCC -I/usr/local/cuda-8.0/include -I/usr/users/wunsch/root_master/build/include -I/usr/local/cuda-8.0/include

So, during the generation of the object file of CudaMatrix.cu, the include dir points only to the build include. But I can't find a copy/install target in CMake, which should copy CudaMatrix.h to this directory. Still, it seems to be a threading problem, which only occurs for the CUDA library.


Regarding building the method with CUDA 7.5 on Ubuntu 16.04:

Here the CMake config:

cmake .. \
-Dalien=OFF -Dcuda=ON \
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-7.5 \
-DCMAKE_CXX_COMPILER=/usr/bin/g++-4.9 \
-DCMAKE_C_COMPILER=/usr/bin/gcc-4.9

The error message:

/usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’:
/usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope
   return (char *) memcpy (__dest, __src, __n) + __n;

And yes, it is fixed, if you append -DCUDA_NVCC_FLAGS="-D_FORCE_INLINES" to CMake! As well, CUDA 8.0 does not break with this flag. So we could append this flag permanently, though I don't like this idea because it clutters the build system. On the other hand, every Ubuntu LTS 16.04 user for the next years will face this problem.

@stwunsch stwunsch changed the title [TMVA/CMake] Fix CMake setup for DNN CUDA implementation [TMVA/CMake][WIP] Fix CMake setup for DNN CUDA implementation Oct 30, 2016
@simonpf
Copy link
Contributor

simonpf commented Oct 31, 2016

The relevant target is probably move_headers defined here. So adding this as a dependency to dnn_cuda could probably solve the problem. I can have a look at this on Wednesday.

Concerning CUDA 7.5 and Ubuntu 16.04: I see this rather as a problem with the User's CUDA setup than
a problem of root considering that this is a common problem also for users of caffe, opencv and torch.

@stwunsch
Copy link
Contributor Author

Unfortunately the error occurs with the default installation on Ubunut 16.04. But still, I think that you are right and it's not ROOT's problem at all.

@stwunsch
Copy link
Contributor Author

stwunsch commented Nov 2, 2016

Yep, adding move_headers as dependency of dnn_cuda solves the problem. Here is the code:

#---Handle CUDA dependent code. -----------------
if(CUDA_FOUND)
  CUDA_INCLUDE_DIRECTORIES(${ROOT_INCLUDE_DIRS})
  CUDA_ADD_LIBRARY(dnn_cuda ${DNN_CUDA_FILES})
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DDNNCUDA")
  set(DNN_CUDA_LIBRARIES dnn_cuda ${CUDA_CUBLAS_LIBRARIES})
  add_dependencies(dnn_cuda move_headers)
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DDNNCUDA")
else()
  set(DNN_CUDA_LIBRARIES)
endif()

This solutions seems pretty hacky for me, shouldn't a library be build with links to the headers in the source directory? Actually, I think that I'm missing something in the build system.

@simonpf
Copy link
Contributor

simonpf commented Nov 3, 2016

OK, great. Now, if you could also remove CUDA_INCLUDE_DIRECTORIES the bug
is fixed and the
build system a bit cleaner.

I am unsure about the hackieness of the solution, as I understand it this
is the way it is done in Root.

On 2 November 2016 at 10:07, Stefan Wunsch notifications@github.com wrote:

Yep, adding move_headers as dependency of dnn_cuda solves the problem.
Here is the code:

#---Handle CUDA dependent code. -----------------
if(CUDA_FOUND)
CUDA_INCLUDE_DIRECTORIES(${ROOT_INCLUDE_DIRS})
CUDA_ADD_LIBRARY(dnn_cuda ${DNN_CUDA_FILES})
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DDNNCUDA")
set(DNN_CUDA_LIBRARIES dnn_cuda ${CUDA_CUBLAS_LIBRARIES})
add_dependencies(dnn_cuda move_headers)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DDNNCUDA")
else()
set(DNN_CUDA_LIBRARIES)
endif()

This solutions seems pretty hacky for me, shouldn't a library be build
with links to the headers in the source directory? Actually, I think that
I'm missing something in the build system.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#283 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGnOhTHxNjmACDlkEFuRr4gm6Ac8iHqMks5q6FLcgaJpZM4KdRgw
.

@stwunsch
Copy link
Contributor Author

stwunsch commented Nov 3, 2016

Following the line

set_property(GLOBAL APPEND PROPERTY ROOT_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/inc)

the ROOT include directories point to the source directory, but now we are waiting for the move_headers target so that we can point to the binary include dir in build/.

Nevertheless, I'll close this here for now and ask CMake guru Pere for advice about the include directories. I think that we've figured out the problems!

@stwunsch stwunsch changed the title [TMVA/CMake][WIP] Fix CMake setup for DNN CUDA implementation [TMVA/CMake] Fix CMake setup for DNN CUDA implementation Nov 3, 2016
@stwunsch
Copy link
Contributor Author

stwunsch commented Nov 3, 2016

I talked to Pere, the move_header dependency is indeed the correct way to implement a custom target. Normally, the dependency is hidden inside ROOT_LINKER_LIBRARY. So the dependency on the CUDA dependent target mirrors this function (which can't be used in our case).

@lmoneta From my point of view, this PR is merging ready!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants