Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] speed up installation with Visual Studio #2965

Merged
merged 4 commits into from
May 14, 2020

Conversation

jameslamb
Copy link
Collaborator

This is another PR that contributes to #2936 , but which I think has standalone value.

Right now on Windows, to figure out which version of Visual Studio is available we try to build lib_lightgbm by generating and executing the corresponding CMake Makefile for the 3 most recent versions of VS. This is pretty expensive, and means that you have to build lib_lightgbm .

I went into this thinking that this is why r-package CI builds on Azure Devops with Visual Studio take 20 minutes...and it seems like it! I tested merging these changes into #2936 on my fork, and it takes the build time down from 20 minutes to about 9 minutes.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, fake project, nice hack!

R-package/src/install.libs.R Outdated Show resolved Hide resolved
R-package/src/install.libs.R Outdated Show resolved Hide resolved
@jameslamb
Copy link
Collaborator Author

@Laurae2 @StrikerRUS could you have another look?

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any issues in logic in this PR. But please wait for an approval of R-maintainers 🙂 .

R-package/src/install.libs.R Outdated Show resolved Hide resolved
R-package/src/install.libs.R Outdated Show resolved Hide resolved
.vsts-ci.yml Outdated Show resolved Hide resolved
@jameslamb
Copy link
Collaborator Author

note to self: per #2936 (comment), I think that in CI we need to add "$env:R_LIB_PATH/Rtools/mingw_64/bin; to $env:PATH, so that gendef and dlltool are found.

If it isn't in our installation instructions already, we should make it clear there that Rtools/mingw_*/ needs to be on your path for Visual Studio installation to work.

@jameslamb
Copy link
Collaborator Author

Now that #2936, I'll get this updated tonight and add CI support for Visual Studio builds. I'll request new reviews after that.

@StrikerRUS
Copy link
Collaborator

@jameslamb Please remove all speedups (ifs in CI scripts) for MSVC and run full --as-cran test according to our plan (#2936 (comment)), so that we will be able to check how much time it takes to run after all other fixes were merged.

@jameslamb
Copy link
Collaborator Author

jameslamb commented May 3, 2020

@jameslamb Please remove all speedups (ifs in CI scripts) for MSVC and run full --as-cran test according to our plan (#2936 (comment)), so that we will be able to check how much time it takes to run after all other fixes were merged.

@StrikerRUS I will do that, but I'm pretty confident we're going to decide not to use --as-cran. Even with all the ifs, it's taking 26 minutes right now :/

https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=5994&view=logs&j=9f61088a-36ee-57e4-0947-03329fd6ee90&t=2528b315-ee14-54c7-8330-fcfcaf0cbb8e

Toggling the timestamps, I can see 22 minutes of that is just the install:

image

UPDATE: looking closer, it takes 10+ minutes between "build files have been written" and the first compilation log message

image

I really think it might be that /MP flag to MSVC isn't being used. UPDATE: ok no that isn't true, but trying some of these suggestions

I'm going to experiment a big on my fork to not take up our CI capacity. Will report back.

I'll keep updating this comment. Will let you know when I reach a conclusion.

  1. This branch before I started testing on my fork
  2. adding /incremental to use incremental linking cut about a minute
  3. Running full build (--as-cran)

NOTE: I can see on the builds on my fork that we're now getting failures on non-Windows like this:

* checking compilation flags used ... WARNING
Compilation used the following non-portable flag(s):
  ‘-Wno-ignored-attributes’ ‘-Wno-return-type’ ‘-Wno-unknown-pragmas’
including flag(s) suppressing warnings

I think it's because I'm temporarily echoing CMAKE_CXX_FLAGS and R CMD check is checking if the build logs match a regular expression for flags like -Wno-unknown-pragmas 😂 . They can be safely ignored because I'll remove that debugging MESSAGE() call, and because I'll handle those flag things in the CRAN-toolchain build once we add that (jameslamb#15).

@jameslamb jameslamb changed the title [R-package] speed up installation with Visual Studio [WIP] [R-package] speed up installation with Visual Studio May 3, 2020
@jameslamb
Copy link
Collaborator Author

jameslamb commented May 5, 2020

I have been trying on my fork and cannot get the build times for R + MSVC under 24 minutes. They run very fast (Rscript build_r.R runs in 1-2 minutes) on my laptop, where I have Visual Studio 16 2019 installed but I never actually use it as an IDE, so it still has all of the default settings.

I have narrowed down the slowness with a lot of print debugging...it takes 10 minutes between when this line appears in the log

Build files have been written to: C:/Users/VssAdministrator/AppData/Local/Temp/RtmpWAMlj1/R.INSTALL32c6d499da/lightgbm/src/build

and when compilation starts. Even for the test project that has no source files and whose CMakeLists.txt is just one PROJECT() call.

That means system(paste0(cmake_cmd, ..)) is taking 10 minutes to complete on Azure DevOps. That command is running the Visual Studio generator, e.g. cmake -G"Visual Studio 15 2017" ....

I've at least narrowed it down to that, which rules out build-time flags like /MP as a root cause.

Theories I tried which were not correct:

  • Adding an explicit timeout in system() calls from R
  • Setting CMAKE_BUILD_TYPE to "Release" (link 1 | link 2)

My last guess here is that because cmake -G writes a LOT of files, maybe disk I/O is really slow in whatever part of the file system I'm writing to on however Azure DevOps configures its build images. Tomorrow I'll see if I have better luck with MSVC builds on AppVeyor, but beyond that I'm out of ideas :/

@StrikerRUS
Copy link
Collaborator

@jameslamb

That means system(paste0(cmake_cmd, ..)) is taking 10 minutes to complete on Azure DevOps. That command is running the Visual Studio generator, e.g. cmake -G"Visual Studio 15 2017" ....
I've at least narrowed it down to that, which rules out build-time flags like /MP as a root cause.

What about --as-cran option? It adds ~4min to 24min build time, right?

@jameslamb
Copy link
Collaborator Author

Hmm, I've noticed that current R builds with MinGW on Windows have interesting variance:

Taking a quick look at those, if you click Toggle Timestamps on the log, it seems like a lot of the variability is about installing MiKTeX and friends upstream of the actual build of the R package.

6004 (7 minutes):

image

6003 (3 minutes)

image

This is kind of a manual and painful process though, I'm sure if we looked across all builds we'd find other sources of variability. Maybe I'll try writing a script tonight that does that, so I know where to focus my effort.

@StrikerRUS
Copy link
Collaborator

Nice investigation! But I think that the main goal now is to try to minimize actual building time of R package.

Can you please clarify that?
#2965 (comment)

@jameslamb
Copy link
Collaborator Author

Nice investigation! But I think that the main goal now is to try to minimize actual building time of R package.

Can you please clarify that?
#2965 (comment)

right sorry, let me clarify:

  1. The R package takes 1-2 minutes to build on my laptop with Visual Studio
  2. The R package takes around 20 minutes to build in CI
  3. I see in the logs that it takes 10 minutes for each call to system(cmake_cmd) to complete in the Azure CI environment, and that is unrelated to the LightGBM project specifically. Since it takes 10 minutes to build a fake project where the CMakeLists.txt file is just one PROJECT() call.
  4. Removing --as-cran and ignoring all the documentation-related things in CI could save about 4 minutes.

Things I tried to fix this, which did not work:

  1. Explicitly setting build type to Release so the other three types of Visual Studio builds would be ignored.
  2. Putting a timeout on system()

Things I'm going to try next:

  1. Running Visual Studio builds on AppVeyor instead
  2. Finding a way to detect Visual Studio in install.libs.R that does not involve running cmake -G

@jameslamb
Copy link
Collaborator Author

There is a lot of information I'm reading through in this answer: https://stackoverflow.com/a/37327661

@jameslamb
Copy link
Collaborator Author

The situation looks even worse on AppVeyor. I cancelled this build after 34 minutes 😱

https://ci.appveyor.com/project/jameslamb/lightgbm/builds/32675576

I ran again with Rscript build_r.R (so we could directly see the logs instead of them being hidden in R CMD CHECK):

https://ci.appveyor.com/project/jameslamb/lightgbm/builds/32675702

You can see from the timestamps on the logs there that it takes about 2 and a half minutes to get to the point where install.libs.R is run....and then 25 minutes later the build hasn't progressed at all.

log (1).txt

@jameslamb
Copy link
Collaborator Author

jameslamb commented May 6, 2020

Ok I might be making progress! To investigate my theory that this is about disk I/O (since cmake -G"Visual Studio..." writes a lot of files), I ran the following Powershell command:

Get-CimInstance Win32_PerfRawData_PerfDisk_LogicalDisk
environment Disk DiskWriteBytesPersec DiskWritesPersec
AppVeyor C: 208896 11
Azure C: 565897728 8787
Azure D: 43876352 1225
my laptop C: 88364674560 3343288
my laptop D: 364544 85

NOTE: There is no D: available on AppVeyor.

AppVeyor log: https://ci.appveyor.com/project/jameslamb/lightgbm/builds/32676579?fullLog=true

Azure DevOps log: https://dev.azure.com/JayLamb20/jameslamb-lightgbm/_build/results?buildId=421&view=logs&j=9f61088a-36ee-57e4-0947-03329fd6ee90&t=2528b315-ee14-54c7-8330-fcfcaf0cbb8e

This is exciting, and I think it's good evidence of what's happening!

  • D: on the VM we're getting from Azure DevOps is around 2000x slower than the C: disk on my laptop. That explains the difference in performance between my laptop and Azure DevOps. D: is where we install R and where the project build directory is by default on Azure DevOps.
  • C: on the VM we're getting for AppVeyor is around 210x slower than D: on Azure DevOps, which explains why the builds get stuck at cmake -G there and never even finish

I'm going to see if moving to C:\ on Azure DevOps is possible and if it's faster.

but wait...why hasn't this been a problem for Python builds?

I don't think we ever test using the Visual Studio generator (cmake -G"Visual Studio...") for the Python package! Our builds use the pre-made MSBuild solution file in https://github.com/microsoft/LightGBM/tree/master/windows

if not any((use_gpu, use_mpi, use_hdfs, nomp, bit32)):
logger.info("Starting to compile with MSBuild from existing solution file.")
platform_toolsets = ("v142", "v141", "v140")
for pt in platform_toolsets:
status = silent_call(["MSBuild",
os.path.join(CURRENT_DIR, "compile", "windows", "LightGBM.sln"),
"/p:Configuration=DLL",
"/p:Platform=x64",
"/p:PlatformToolset={0}".format(pt)])
if status == 0 and os.path.exists(lib_path):
break
else:
clear_path(os.path.join(CURRENT_DIR, "compile", "windows", "x64"))

@jameslamb
Copy link
Collaborator Author

Short Summary

Disk throughput does the difference between install times on CI and my laptop, but moving from D:\ to C:\ on Azure DevOps builds doesn't help our build times.

How I reached this conclusion

Based on #2965 (comment), I tried adding some logic that says "if you are on Azure, work in C:\ not D:\". I expected a small speedup from this since any writes to disk should be 10x faster.

I tested on my fork. I tried 5 builds with copying to C:\ and 5 without. The times for the R package build are reported below.

All builds below are already using the configuration of R CMD check that skips all documentation.

Test Results

with (build link)

median: 26m22s

image

without

median: 25m53s

image

Conclusion

  1. Yes, disk throughput is a lot lower in our CI environments than on my laptop (it is not a very good laptop, by the way), and that is causing cmake -G"Visual Studio..." to take around 10 minutes to run.
  2. That command is run twice, once to figure out which version (if any) of Visual Studio is installed and once to actually build LightGBM. Because finding installed versions of Visual Studio has changed from version to version, I don't think it would be easy to eliminate the first cmake -G call used to determine if Visual Studio is working.
  3. Moving all the files to C:\ at the beginning of an Azure builds doesn't improve the build times. That operation has a cost, and might even be slightly slowing down builds.

Next Step

It might be possible to cut one of the cmake -G calls, but it will increase the complexity of install.libs.R, which is already somewhat complex. If .find_vs_version() ran cmake -G on LightGBM itself (not a test project), we could skip running it again before building.

This is the only other idea I have, so I'll try that tomorrow. If it doesn't work or we decide it introduces too much complexity, I think 25-minute builds will be something we have to live with to prevent things like #2995 and #2997 from re-emerging.

@jameslamb jameslamb force-pushed the r/visual-studio-install branch from f9057c3 to 596b5ef Compare May 6, 2020 07:02
@jameslamb
Copy link
Collaborator Author

Got it down to 16 minutes!

I know this PR has a lot of long comments, I apologize. Wanted to document the investigation as I went, since it reveals a lot about how we're building the R package and the project in general.

I now have the MSVC builds down to 16 minutes!

What Changed?

I tried this idea from #2965 (comment)

It might be possible to cut one of the cmake -G calls, but it will increase the complexity of install.libs.R, which is already somewhat complex. If .find_vs_version() ran cmake -G on LightGBM itself (not a test project), we could skip running it again before building.

I did 5 builds

median: 15m49s
image

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something went wrong during merging branches.

R-package/src/install.libs.R Outdated Show resolved Hide resolved
@jameslamb
Copy link
Collaborator Author

Something went wrong during merging branches.

I'm very confused about what could have happened. Trying to figure it out. I don't understand why changes like #3078 are showing up in the diff, since they already exist on master

@jameslamb jameslamb force-pushed the r/visual-studio-install branch from 513684e to ebfbacc Compare May 13, 2020 18:12
@jameslamb
Copy link
Collaborator Author

Ok I think I fixed the conflicts! I still have no idea how we got into this state...the danger of long-lived feature branches, I guess 😬

I am also very unsure how this PR only has one commit now. I only git reset back about 6 commits.

Either way, I think this PR is back to the state it was as of all previous reviews, including the most recent suggestion (#2965 (comment))

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb
Great improvement! Thanks a lot!
One comment for your consideration.

R-package/src/install.libs.R Show resolved Hide resolved
@jameslamb
Copy link
Collaborator Author

Thanks for the reviews everyone! I'll merge this when it builds.

@jameslamb
Copy link
Collaborator Author

@jameslamb
Copy link
Collaborator Author

Looks like Azure capacity problems to me:

image

@jameslamb
Copy link
Collaborator Author

Ok well now it's ✅ but I can see the linux sdist task never ran. Going to push an empty commit, just to be cautious

@jameslamb
Copy link
Collaborator Author

check-docs task is failing on one of our links 😬

image

this link is definitely valid though:

https://github.com/microsoft/LightGBM/blob/master/python-package/README.rst#build-from-sources

image

@StrikerRUS
Copy link
Collaborator

Seems that GitHub did something with their DOM / js (user-content- prefix)...

image

@jameslamb
Copy link
Collaborator Author

Seems that GitHub did something with their DOM / js (user-content- prefix)...

image

🤒 ok yeah, I agree. Very weird that it works from a curl request like my example above but fails linkchecker.

Either way, I think we need to update it to user-content. I'll push a PR right now.

@StrikerRUS
Copy link
Collaborator

Very weird that it works from a curl request like my example above but fails linkchecker.

I hope that with 10.0 release of linkchecker we will suffer a little bit less 😉 .

@StrikerRUS StrikerRUS merged commit 3caf359 into microsoft:master May 14, 2020
@jameslamb jameslamb deleted the r/visual-studio-install branch May 15, 2020 03:44
odimka pushed a commit to odimka/LightGBM that referenced this pull request May 17, 2020
* fix conflicts

* Update R-package/src/install.libs.R

* empty commit
ChipKerchner pushed a commit to ChipKerchner/LightGBM that referenced this pull request Jun 10, 2020
* fix conflicts

* Update R-package/src/install.libs.R

* empty commit
ChipKerchner pushed a commit to ChipKerchner/LightGBM that referenced this pull request Jun 11, 2020
* fix conflicts

* Update R-package/src/install.libs.R

* empty commit
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants