Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3Packages.catboost: 1.0.5 -> 1.2.2; build with cmake #235226

Merged
merged 3 commits into from
Oct 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions pkgs/development/libraries/catboost/default.nix
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
{ lib
, config
, stdenv
, fetchFromGitHub
, cmake
, libiconv
, llvmPackages
, ninja
, openssl
, python3Packages
, ragel
, yasm
, zlib
, cudaSupport ? config.cudaSupport
, cudaPackages ? {}
veprbl marked this conversation as resolved.
Show resolved Hide resolved
, pythonSupport ? false
}:

stdenv.mkDerivation (finalAttrs: {
pname = "catboost";
version = "1.2.2";

src = fetchFromGitHub {
owner = "catboost";
repo = "catboost";
rev = "refs/tags/v${finalAttrs.version}";
hash = "sha256-A1zCIqPOW21dHKBQHRtS+/sstZ2o6F8k71lmJFGn0+g=";
};

patches = [
./remove-conan.patch
];

postPatch = ''
substituteInPlace cmake/common.cmake \
--replace "\''${RAGEL_BIN}" "${ragel}/bin/ragel" \
--replace "\''${YASM_BIN}" "${yasm}/bin/yasm"
Comment on lines +36 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosmetic: lib.getExe ragel would probably work too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither package has mainProgram set yet, and using getExe will cause a warning.😞


shopt -s globstar
for cmakelists in **/CMakeLists.*; do
sed -i "s/OpenSSL::OpenSSL/OpenSSL::SSL/g" $cmakelists
${lib.optionalString (lib.versionOlder cudaPackages.cudaVersion "11.8") ''
sed -i 's/-gencode=arch=compute_89,code=sm_89//g' $cmakelists
sed -i 's/-gencode=arch=compute_90,code=sm_90//g' $cmakelists
Comment on lines +43 to +44
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dropping the support for 89, 90? Is there a way to specify the list of gencodes for which to build the kernels?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, probably not.
I could not find any flags that would set compute capability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing how moving cuda_* to *[bB]uildInputs has worked, and how the build logs seem to mention FindCUDAToolkit.cmake: https://gist.github.com/SomeoneSerge/395dfc734cc220aae928cd9d501acabd#file-gistfile0-txt-L52

...I think it should be possible to control the target cuda architectures by setting https://cmake.org/cmake/help/latest/variable/CMAKE_CUDA_ARCHITECTURES.html, unless upstream overrides the variable. The configurable way to choose capabilities at the moment would be to use cudaPackages.cudaFlags, which respect config.cudaCapabilities (and whatever they might get replaced with later). Examples:

"-DCMAKE_CUDA_ARCHITECTURES=${builtins.concatStringsSep ";" (map dropDot cudaCapabilities)}"

# E.g. [ "80" "86" "90" ]
cudaArchitectures = (builtins.map cudaFlags.dropDot cudaCapabilities);
cudaArchitecturesString = strings.concatStringsSep ";" cudaArchitectures;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested this in SomeoneSerge@9780f86 and it doesn't work because upstream passes some ad hoc flags instead of using CMAKE_CUDA_ARCHITECTURES:

https://github.com/catboost/catboost/blob/ad3934317eb5357781e2389f1c24582c01f738ca/catboost/cuda/ctrs/CMakeLists.linux-x86_64-cuda.txt#L34-L56 https://github.com/catboost/catboost/blob/ad3934317eb5357781e2389f1c24582c01f738ca/cmake/cuda.cmake#L163-L165

I think we should contact upstream about changing that

''}
done
'';

outputs = [ "out" "dev" ];

nativeBuildInputs = [
cmake
llvmPackages.bintools
ninja
(python3Packages.python.withPackages (ps: with ps; [ six ]))
ragel
yasm
] ++ lib.optionals cudaSupport (with cudaPackages; [
cuda_nvcc
]);

buildInputs = [
openssl
zlib
] ++ lib.optionals stdenv.isDarwin [
libiconv
] ++ lib.optionals cudaSupport (with cudaPackages; [
cuda_cudart
cuda_cccl
libcublas
]);

env = {
CUDAHOSTCXX = lib.optionalString cudaSupport "${stdenv.cc}/bin/cc";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set this variable in setup-cuda-hook.sh (propagated by cuda_nvcc in natvieBuildInputs), so this should be a no-op. Also, if upstream uses FindCUDAToolkit and enable_language(CUDA), then CUDAHOSTCXX is superseded by CMAKE_CUDA_HOST_COMPILER (which the hook should set as well)

Copy link
Member Author

@natsukium natsukium Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As reported in the first comment, nvcc uses gcc instead of clang as the backendStdenv without it.
We will get the following error message if we do not set this environment variable.

Error log
catboost> [2892/4763] Building CUDA object library/cpp/cuda/wrappers/CMakeFiles/cpp-cuda-wrappers.dir/kernel.cu.o
catboost> FAILED: library/cpp/cuda/wrappers/CMakeFiles/cpp-cuda-wrappers.dir/kernel.cu.o 
catboost> /nix/store/mr3zd7dhyxdwr9y9gf6l74fah1rkjkji-cuda_nvcc-11.8.89/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/nix/store/848ld56pzbywlxy1pv9xi5d8gsvzb6hl-gcc-wrapper-11.4.0/bin/c++ -DCATBOOST_OPENSOURCE=yes -I/build/source -I/build/source/build -I/build/source/contrib/libs/nvidia/thrust -I/build/source/contrib/libs/nvidia/cub -I/build/source/contrib/libs/linux-headers -I/build/source/contrib/libs/linux-headers/_nf -I/build/source/contrib/libs/cxxsupp/libcxx/include -I/build/source/contrib/libs/cxxsupp/libcxxrt/include -I/build/source/contrib/libs/clang14-rt/include -I/build/source/contrib/libs/zlib/include -I/build/source/contrib/libs/double-conversion -I/build/source/contrib/libs/libc_compat/include/readpassphrase -I/build/source/contrib/libs/libc_compat/reallocarray -I/build/source/contrib/libs/libc_compat/random -isystem /nix/store/mr3zd7dhyxdwr9y9gf6l74fah1rkjkji-cuda_nvcc-11.8.89/include -D_THREAD_SAFE -D_PTHREADS -D_REENTRANT -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__LONG_LONG_SUPPORTED -D_GNU_SOURCE -DLIBCXX_BUILDING_LIBCXXRT -D_FILE_OFFSET_BITS=64 -D_YNDX_LIBUNWIND_ENABLE_EXCEPTION_BACKTRACE -DSSE_ENABLED=1 -DSSE3_ENABLED=1 -DSSSE3_ENABLED=1 -DSSE41_ENABLED=1 -DSSE42_ENABLED=1 -DPOPCNT_ENABLED=1 -DCX16_ENABLED=1 --compiler-options -fexceptions,-fno-common,-fcolor-diagnostics,-fdebug-default-version=4,-ffunction-sections,-fdata-sections,-Wall,-Wextra,-Wno-parentheses,-Wno-implicit-const-int-float-conversion,-Wno-unknown-warning-option,-pipe,-fuse-init-array,-m64,-mpopcnt,-mcx16,-Woverloaded-virtual,-Wimport-preprocessor-directive-pedantic,-Wno-undefined-var-template,-Wno-return-std-move,-Wno-defaulted-function-deleted,-Wno-pessimizing-move,-Wno-deprecated-anon-enum-enum-conversion,-Wno-deprecated-enum-enum-conversion,-Wno-deprecated-enum-float-conversion,-Wno-ambiguous-reversed-operator,-Wno-deprecated-volatile --expt-extended-lambda --expt-relaxed-constexpr --compiler-options -std=c++14 -DTHRUST_IGNORE_CUB_VERSION_CHECK --threads 0 -O3 -DNDEBUG -std=c++14 -Xcompiler=-fPIC -nostdinc++ -DLIBCXX_BUILDING_LIBCXXRT -D_libunwind_ -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=compute_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=compute_60 -gencode arch=compute_61,code=compute_61 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 --ptxas-options=-v -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_90,code=sm_90 -MD -MT library/cpp/cuda/wrappers/CMakeFiles/cpp-cuda-wrappers.dir/kernel.cu.o -MF library/cpp/cuda/wrappers/CMakeFiles/cpp-cuda-wrappers.dir/kernel.cu.o.d -x cu -c /build/source/library/cpp/cuda/wrappers/kernel.cu -o library/cpp/cuda/wrappers/CMakeFiles/cpp-cuda-wrappers.dir/kernel.cu.o
catboost> nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
catboost> nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
catboost> g++: error: unrecognized command-line option '-fcolor-diagnostics'
catboost> g++: error: unrecognized command-line option '-fdebug-default-version=4'
catboost> g++: error: unrecognized command-line option '-fuse-init-array'
catboost> g++: error: unrecognized command-line option '-Wimport-preprocessor-directive-pedantic'
catboost> g++: error: unrecognized command-line option '-fcolor-diagnostics'
catboost> g++: error: unrecognized command-line option '-fdebug-default-version=4'
catboost> g++: error: unrecognized command-line option '-fuse-init-array'
catboost> g++: error: unrecognized command-line option '-Wimport-preprocessor-directive-pedantic'
catboost> g++: error: unrecognized command-line option '-fcolor-diagnostics'
catboost> g++: error: unrecognized command-line option '-fdebug-default-version=4'
catboost> g++: error: unrecognized command-line option '-fuse-init-array'
catboost> g++: error: unrecognized command-line option '-Wimport-preprocessor-directive-pedantic'
catboost> g++: error: unrecognized command-line option '-fcolor-diagnostics'
catboost> g++: error: unrecognized command-line option '-fdebug-default-version=4'
catboost> g++: error: unrecognized command-line option '-fuse-init-array'
catboost> g++: error: unrecognized command-line option '-Wimport-preprocessor-directive-pedantic'
catboost> g++: error: unrecognized command-line option '-fcolor-diagnostics'
catboost> g++: error: unrecognized command-line option '-fdebug-default-version=4'
catboost> g++: error: unrecognized command-line option '-fuse-init-array'
catboost> g++: error: unrecognized command-line option '-Wimport-preprocessor-directive-pedantic'
catboost> g++: error: unrecognized command-line option '-fcolor-diagnostics'
catboost> g++: error: unrecognized command-line option '-fdebug-default-version=4'
catboost> g++: error: unrecognized command-line option '-fuse-init-array'
catboost> g++: error: unrecognized command-line option '-Wimport-preprocessor-directive-pedantic'
catboost> g++: error: unrecognized command-line option '-fcolor-diagnostics'
catboost> g++: error: unrecognized command-line option '-fdebug-default-version=4'
catboost> g++: error: unrecognized command-line option '-fuse-init-array'
catboost> g++: error: unrecognized command-line option '-Wimport-preprocessor-directive-pedantic'
catboost> g++: error: unrecognized command-line option '-fcolor-diagnostics'
catboost> g++: error: unrecognized command-line option '-fdebug-default-version=4'
catboost> g++: error: unrecognized command-line option '-fuse-init-array'
catboost> g++: error: unrecognized command-line option '-Wimport-preprocessor-directive-pedantic'
catboost> g++: error: unrecognized command-line option '-fcolor-diagnostics'
catboost> g++: error: unrecognized command-line option '-fdebug-default-version=4'
catboost> g++: error: unrecognized command-line option '-fuse-init-array'
catboost> g++: error: unrecognized command-line option '-Wimport-preprocessor-directive-pedantic'
catboost> g++: error: unrecognized command-line option '-fcolor-diagnostics'
catboost> g++: error: unrecognized command-line option '-fdebug-default-version=4'
catboost> g++: error: unrecognized command-line option '-fuse-init-array'
catboost> g++: error: unrecognized command-line option '-Wimport-preprocessor-directive-pedantic'
catboost> g++: error: unrecognized command-line option '-fcolor-diagnostics'
catboost> g++: error: unrecognized command-line option '-fdebug-default-version=4'
catboost> g++: error: unrecognized command-line option '-fuse-init-array'
catboost> g++: error: unrecognized command-line option '-Wimport-preprocessor-directive-pedantic'
catboost> fatal   : Could not open input file /build/tmpxft_0000501d_00000000-16_kernel.compute_35.cpp1.ii

NIX_CFLAGS_LINK = lib.optionalString stdenv.isLinux "-fuse-ld=lld";
natsukium marked this conversation as resolved.
Show resolved Hide resolved
NIX_LDFLAGS = "-lc -lm";
};

cmakeFlags = [
"-DCMAKE_BINARY_DIR=$out"
"-DCMAKE_POSITION_INDEPENDENT_CODE=on"
"-DCATBOOST_COMPONENTS=app;libs${lib.optionalString pythonSupport ";python-package"}"
] ++ lib.optionals cudaSupport [
"-DHAVE_CUDA=on"
];

installPhase = ''
runHook preInstall

mkdir $dev
cp -r catboost $dev
install -Dm555 catboost/app/catboost -t $out/bin
install -Dm444 catboost/libs/model_interface/static/lib/libmodel_interface-static-lib.a -t $out/lib
install -Dm444 catboost/libs/model_interface/libcatboostmodel${stdenv.hostPlatform.extensions.sharedLibrary} -t $out/lib
install -Dm444 catboost/libs/train_interface/libcatboost${stdenv.hostPlatform.extensions.sharedLibrary} -t $out/lib

runHook postInstall
'';

meta = with lib; {
description = "High-performance library for gradient boosting on decision trees";
longDescription = ''
A fast, scalable, high performance Gradient Boosting on Decision Trees
library, used for ranking, classification, regression and other machine
learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
'';
license = licenses.asl20;
platforms = platforms.unix;
homepage = "https://catboost.ai";
maintainers = with maintainers; [ PlushBeaver natsukium ];
mainProgram = "catboost";
};
})
34 changes: 34 additions & 0 deletions pkgs/development/libraries/catboost/remove-conan.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
diff --git a/CMakeLists.txt b/CMakeLists.txt
index becd2ad03c..7e3c8c99b1 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -27,7 +27,6 @@ cmake_policy(SET CMP0104 OLD)

include(cmake/archive.cmake)
include(cmake/common.cmake)
-include(cmake/conan.cmake)
include(cmake/cuda.cmake)
include(cmake/cython.cmake)
include(cmake/fbs.cmake)
@@ -37,21 +36,6 @@ include(cmake/recursive_library.cmake)
include(cmake/swig.cmake)
include(cmake/global_vars.cmake)

-if (CMAKE_CROSSCOMPILING)
- include(${CMAKE_BINARY_DIR}/conan_paths.cmake)
-else()
- conan_cmake_autodetect(settings)
- conan_cmake_install(
- PATH_OR_REFERENCE ${CMAKE_SOURCE_DIR}
- INSTALL_FOLDER ${CMAKE_BINARY_DIR}
- BUILD missing
- REMOTE conancenter
- SETTINGS ${settings}
- ENV "CONAN_CMAKE_GENERATOR=${CMAKE_GENERATOR}"
- CONF "tools.cmake.cmaketoolchain:generator=${CMAKE_GENERATOR}"
- )
-endif()
-
if (CMAKE_SYSTEM_NAME STREQUAL "Linux" AND CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64" AND NOT HAVE_CUDA)
include(CMakeLists.linux-x86_64.txt)
elseif (CMAKE_SYSTEM_NAME STREQUAL "Linux" AND CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64" AND HAVE_CUDA)
96 changes: 41 additions & 55 deletions pkgs/development/python-modules/catboost/default.nix
Original file line number Diff line number Diff line change
@@ -1,64 +1,50 @@
{ buildPythonPackage, fetchFromGitHub, lib, pythonOlder
, clang_12, python
, graphviz, matplotlib, numpy, pandas, plotly, scipy, six
, withCuda ? false, cudatoolkit }:

buildPythonPackage rec {
pname = "catboost";
# nixpkgs-update: no auto update
version = "1.0.5";

disabled = pythonOlder "3.4";
{ lib
, buildPythonPackage
, catboost
, python
, graphviz
, matplotlib
, numpy
, pandas
, plotly
, scipy
, setuptools
, six
, wheel
}:

buildPythonPackage {
inherit (catboost) pname version src meta;
format = "pyproject";

sourceRoot = "source/catboost/python-package";

nativeBuildInputs = [
setuptools
wheel
];

src = fetchFromGitHub {
owner = "catboost";
repo = "catboost";
rev = "refs/tags/v${version}";
hash = "sha256-ILemeZUBI9jPb9G6F7QX/T1HaVhQ+g6y7YmsT6DFCJk";
};
propagatedBuildInputs = [
graphviz
matplotlib
numpy
pandas
plotly
scipy
six
];

nativeBuildInputs = [ clang_12 ];
buildPhase = ''
runHook preBuild

propagatedBuildInputs = [ graphviz matplotlib numpy pandas scipy plotly six ]
++ lib.optionals withCuda [ cudatoolkit ];
# these arguments must set after bdist_wheel
${python.pythonForBuild.interpreter} setup.py bdist_wheel --no-widget --prebuilt-extensions-build-root-dir=${lib.getDev catboost}
natsukium marked this conversation as resolved.
Show resolved Hide resolved

patches = [
./nix-support.patch
];

postPatch = ''
# substituteInPlace is too slow for these large files, and the target has lots of numbers in it that change often.
sed -e 's|\$(YMAKE_PYTHON3-.*)/python3|${python.interpreter}|' -i make/*.makefile
runHook postBuild
'';

preBuild = ''
cd catboost/python-package
'';
setupPyBuildFlags = [ "--with-ymake=no" ];
CUDA_ROOT = lib.optional withCuda cudatoolkit;
enableParallelBuilding = true;
# setup a test is difficult
doCheck = false;

# Tests use custom "ya" tool, not yet supported.
dontUseSetuptoolsCheck = true;
pythonImportsCheck = [ "catboost" ];

passthru = {
# Do not update to catboost 1.1.x because the patch doesn't apply cleanly
skipBulkUpdate = true;
};

meta = with lib; {
description = "High-performance library for gradient boosting on decision trees.";
longDescription = ''
A fast, scalable, high performance Gradient Boosting on Decision Trees
library, used for ranking, classification, regression and other machine
learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
'';
license = licenses.asl20;
platforms = [ "x86_64-linux" ];
homepage = "https://catboost.ai";
maintainers = with maintainers; [ PlushBeaver ];
# _catboost.pyx.cpp:226822:19: error: use of undeclared identifier '_PyGen_Send'
broken = withCuda;
};
}
Loading