Compile times are growing significantly #581

jrhemstad · 2018-09-09T17:57:58Z

Feature request

As anyone who has built libgdf recently has surely noticed, the time to compile the library from scratch has grown significantly in the last few months. For example, compiling on all 10 cores of my i9-7900X @ 3.30GHz takes 11 minutes as reported by time make -j.

Compiling with -ftime-report may be a good place to start to see where all the compilation time is being spent.

This is likely due to the large amount of template instantiation that is required for instantiating functions for all possible types. We should make sure that best practices are being followed in template instantiation such that a template for a given type is only having to be instantiated once via explicit instantiation.

Much of our code is implemented in headers, which causes it to be recompiled everywhere that header is included. Using pre-compiled headers may help:
https://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html
http://itscompiling.eu/2017/01/12/precompiled-headers-cpp-compilation/

Furthermore, code should be scrubbed from excessive and unnecessary #includes. Compiling with -MD will show what files are being included

Here's a Clang tool that ensures you only include the necessary headers: https://github.com/include-what-you-use/include-what-you-use

Here's a Clang tool to profile time spent in template instantiation: https://github.com/mikael-s-persson/templight

The text was updated successfully, but these errors were encountered:

harrism · 2018-10-26T01:31:56Z

Do the clang tools work with nvcc? Much of our template time is in nvcc. You can use nvcc --time foo.csv to dump timing for different nvcc phases.

jrhemstad · 2018-10-26T01:43:53Z

I'm not sure. You can technically get Clang to compile device code, so that may be a path worth exploring using Clang + these tools.

hcho3 · 2020-06-12T07:34:21Z

Any updates on this? I'd love to use precompiled headers with CUDA projects.

harrism · 2020-06-15T00:34:27Z

Compile time continues to grow, but that is largely because our supported feature set and supported types continue to grow. In 0.15 we are aiming to add at least 10 new types (4 unsigned int types, 4 timestamp types, list column type, decimal fixed-point type). Naturally this will increase compile time and binary size.

Meanwhile, in 0.14 we dropped all of the legacy APIs that were previously deprecated, which reduced compile time a bit, and significantly reduced binary size. There have been and will continue to be various efforts to reduce compile time of certain components. We are investigating possibly splitting libcudf into multiple libraries.

We have not discussed precompiled headers.

beckernick · 2021-07-23T17:50:02Z

@jrhemstad @harrism , is this still a relevant issue?

jrhemstad · 2021-07-23T22:56:00Z

Our compile time is worse than ever, so I guess its still relevant. We could benefit from someone putting in a concerted effort to eliminate unnecessary includes across the library.

vyasr · 2022-12-20T02:12:26Z

I'm not sure. You can technically get Clang to compile device code, so that may be a path worth exploring using Clang + these tools.

Out of curiosity I gave this a quick shot. (Unsurprisingly) clang does not currently support the experimental CUDA features that we have enabled (--expt-extended-lambda --expt-relaxed-constexpr) so the compilation terminates pretty quickly. Not sure if there are other downstream issues that we would face if we attempted this after stripping those out (not suggesting that we should, although #7795 remains open so maybe it is worth pursuing).

jrhemstad · 2022-12-21T16:34:21Z

clang does not currently support the experimental CUDA features that we have enabled (--expt-extended-lambda --expt-relaxed-constexpr)

Pretty sure clang supports those features natively without the need for any extra compile flags. I'm guessing the error was caused by clang not recognizing those flags.

vyasr · 2022-12-21T18:58:41Z

You're right, it does. I removed those and made some progress, but not nearly enough for a working build with clang yet. Here's a list of necessary changes so far:

Remove all CUDF_CUDA_FLAGS set in ConfigureCUDA.cmake. Most are either unsupported or ignored (including some of the warnings flags), so the blanket removal is easiest.
Remove [[nodiscard]] attributes, which don't appear to be supported in clang device code yet.
Set -D__STRICT_ANSI__ as a CUDA compiler flag. Otherwise it tries to compile float128 code, which is not yet supported in clang.
Pass -fcuda-allow-variadic-functions to the clang compiler (or via the CMake configure CLI -DCMAKE_CUDA_FLAGS="-Xclang -fcuda-allow-variadic-functions"

At this point I start seeing failures like this:

...type_traits:2777:5: error: no type named 'type' in 'std::invoke_result<cudf::detail::indexalator_factory::nullable_index_accessor, int>'

and

error: type 'thrust::transform_iterator<(lambda at ...gather.cu:49:26), cudf::detail::input_indexalator>::super_t' (aka 'iterator_adaptor<transform_iterator<(lambda at ...gather.cu:49:26), cudf::detail::input_indexalator, thrust::use_default, thrust::use_default>, cudf::detail::input_indexalator, int, thrust::use_default, std::random_access_iterator_tag, int>') is not a direct or virtual base of 'thrust::transform_iterator<(lambda at ...gather.cu:49:26), cudf::detail::input_indexalator>'

I need to track this down a bit further, but it looks like some aspect of how thrust SFINAEs different code paths isn't supported in clang device code yet either.

PointKernel · 2022-12-21T21:46:13Z

I tried to build cuco with clang about a year ago and was blocked by its dependencies like thrust or libcudacxx that cannot be built with clang. To find how much effort is required to build device code with clang, I would suggest starting with a smaller library like cuco and see how it goes from there.

Related issues:

vyasr · 2022-12-21T22:24:41Z

Well then... looks like we've got to work our way all the way up the stack for this. For the purpose of something like clang-tidy we might be able to get some partial results based on the discussion in rapidsai/raft#424, but that's probably only partial support at best and I don't know if that will work with the other tools of interest like IWYU.

vyasr · 2024-10-09T17:05:25Z

Compile times are an ever-present problem for us. This issue as currently framed isn't clearly actionable, so let's lay out some concrete points.

We should make sure that best practices are being followed in template instantiation

#379 implemented the type_dispatcher, which now controls all our template instantiations and ensures that we have a minimal set.

Compiling with -ftime-report may be a good place to start to see where all the compilation time is being spent.

Since #9631 we have been tracking build times in CI. We monitor this and keep an eye on TUs that are particularly slow to compile. Where necessary, we have reacted to slow compilation by reorganizing the code and explicitly instantiating templates.

Furthermore, code should be scrubbed from excessive and unnecessary #includes.

This seems like the main action item remaining. As discussed above, include-what-you-use is a good tool for this, so I would consider evaluating and either implementing or rejecting that as the only thing left to do here. Since clang compilation is the bottleneck here, I propose that we just get our C/C++ source files working first. Once we have systematic approaches in place for that and are regularly checking on the quality, we can incrementally work up to getting CUDA source files working since that is a much heavier lift (and at that point we can also split between host and device code in cu files). This is similar to the approach we are taking in #16958 for clang-tidy.

mike-wendt transferred this issue from rapidsai/libgdf Dec 20, 2018

mike-wendt added the Needs Triage Need team to review and classify label Dec 20, 2018

kkraus14 added code quality CMake CMake build issue and removed Needs Triage Need team to review and classify labels Feb 13, 2019

jrhemstad mentioned this issue Oct 1, 2019

[FEA] Update cmake to only build for present GPU #2902

Closed

GregoryKimball added this to the Helps developer velocity milestone Nov 26, 2022

vyasr mentioned this issue Jan 5, 2023

Add clang-tidy for automatic linting #584

Closed

vyasr removed the code quality label Feb 23, 2024

jameslamb mentioned this issue Sep 24, 2024

bump NCCL floor to 2.18.1.1, include nccl.h where it's needed rapidsai/cugraph#4661

Merged

vyasr self-assigned this Oct 9, 2024

vyasr linked a pull request Oct 14, 2024 that will close this issue

Add IWYU to CI #17078

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile times are growing significantly #581

Compile times are growing significantly #581

jrhemstad commented Sep 9, 2018

harrism commented Oct 26, 2018

jrhemstad commented Oct 26, 2018

hcho3 commented Jun 12, 2020

harrism commented Jun 15, 2020

beckernick commented Jul 23, 2021

jrhemstad commented Jul 23, 2021

vyasr commented Dec 20, 2022 •

edited

Loading

jrhemstad commented Dec 21, 2022

vyasr commented Dec 21, 2022 •

edited

Loading

PointKernel commented Dec 21, 2022 •

edited

Loading

vyasr commented Dec 21, 2022 •

edited

Loading

vyasr commented Oct 9, 2024

Compile times are growing significantly #581

Compile times are growing significantly #581

Comments

jrhemstad commented Sep 9, 2018

Feature request

harrism commented Oct 26, 2018

jrhemstad commented Oct 26, 2018

hcho3 commented Jun 12, 2020

harrism commented Jun 15, 2020

beckernick commented Jul 23, 2021

jrhemstad commented Jul 23, 2021

vyasr commented Dec 20, 2022 • edited Loading

jrhemstad commented Dec 21, 2022

vyasr commented Dec 21, 2022 • edited Loading

PointKernel commented Dec 21, 2022 • edited Loading

vyasr commented Dec 21, 2022 • edited Loading

vyasr commented Oct 9, 2024

vyasr commented Dec 20, 2022 •

edited

Loading

vyasr commented Dec 21, 2022 •

edited

Loading

PointKernel commented Dec 21, 2022 •

edited

Loading

vyasr commented Dec 21, 2022 •

edited

Loading