-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Visual Studio reports higher CPU usage than MinGW #542
Comments
@Laurae2 Actually, I am thinking, can we just use the compiled DLL/lib for the R-package (like python-package)? |
@guolinke I think some people did it: https://erpcoder.wordpress.com/2016/06/15/how-to-develop-a-c-dll-for-r-in-visual-studio-2015/ But not sure how feasible it is in reality. |
@guolinke I am getting this error when using Visual Studio or MinGW externally compiled DLL out of the box: > train$construct()
Error in .Call(fun_name, ..., ret, call_state, PACKAGE = "lightgbm") :
"LGBM_DatasetCreateFromFile_R" not available for .Call() for package "lightgbm" |
Some simple tests:
Mingw version:
vs2013 version:
Mingw version:
vs2013 version:
Mingw version:
vs2013 version:
I think the multi-threading performance of mingw is actually poor, especially for the sparse dataset... |
@Laurae2 I try many ways to hack R's package with pre-compiled dll, but didn't success yet. |
@guolinke We can use .exe wrapper like I did on my package (https://github.com/Laurae2/Laurae) but this is not the best solution (we lose the ability to use callbacks, etc.). R can dynamically load binaries (like it loads dlls/so/dylib), there must be a way to hack a way through it. VS and MinGW's externally compiled dll are loading fine in R: https://stat.ethz.ch/R-manual/R-devel/library/base/html/dynload.html - but how to use them becomes wizardry (like using dlls in VBA). Or, we need someone who compiled R with Visual Studio to provide us the R-compiled DLL. Compiling R with Visual Studio is a very difficult task if not impossible... (difficult enough just to get MinGW compile R). |
In the past MinGW and VS for LightGBM were very close, now they are very different... Did you test by changing OpenMP static scheduling to guided/dynamic scheduling? xgboost got a performance boost by using guided/dynamic scheduling. |
@Laurae2 For the static/dynamic/guided. Most of loop are using static, few of them are guided. I didn't use the dynamic due to the low performance. |
@guolinke I'll retry with my 20 core server since it is available today. I will update here when I get new results: Laurae2/gbt_benchmarks#1 |
@guolinke Do you have any idea about these results? VS is very good when doing heavy multithreading, while MinGW is better for lower amount of threads. i7-4600U, 4 threads:
Dual Xeon Ivy Bridge, 2.7GHz, 2x 10 cores:
For the R package, using a dll compiled with Visual Studio, I think it requires loading it dynamically and using directly its calls... I think it is not possible currently unless you create the datasets in memory and do everything in memory (like doing a CLI in memory). Do you think this is possible? I think this might be the "easiest" (but not best) alternative:
This would also fix the issue we currently have with datasets being stuck in memory, or memory not being freed up without removing the model(s). But that (new) R-package would lose:
It means it would also require to be able to call R everytime a metric/objective is used while training is running, for every iteration. |
@guolinke I found this saying using Visual Studio DLL cannot work with C++ code:
http://r.789695.n4.nabble.com/RStudio-Calling-C-Visual-Studio-DLL-td4703642.html It requires a C wrapper apparently. |
@Laurae2 I think all expose apis are c api. So it can be used. For the DLL in R packages, I think the VS version can be normally used. The only problem is how to put it into R package correctly, without the error of "cannot find lightgbm.dll" . Another possible solution is solving the multi-threading problem in MinGW. |
@guolinke I think you have to put the dll in Using https://stat.ethz.ch/R-manual/R-devel/library/base/html/library.dynam.html on library load can load the DLL from package. |
I think the tip from here might help: http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2011-April/002207.html
The lib file was apparently linked without anything extra required:
|
@Laurae2 I just try a small experiment. I build R's dll by the branch The only problem is how to put our pre-compiled DLL into R-package when building package. |
finally, I find a solution from https://cran.r-project.org/doc/manuals/r-release/R-exts.html
And build the package successfully with the pre-compile dll. |
close since we can build R package by visual studio. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
(Requires investigation)
This issue reflects an issue found on the following issues:
We have to find out why there is such discrepancy between Visual Studio and MinGW.
Even worse is MinGW compilation leading to a way less CPU usage during training, but ending only slightly slower than Visual Studio which is taking 100% CPU.
MinGW vs Visual Studio is a known issue with OpenMP especially for xgboost in Windows. One can look there: dmlc/xgboost#2243
Possible theory: perhaps threads are locked in Visual Studio (so they are hot), while MinGW is freeing them when possible with OpenMP (cold cores). This would lead to MinGW compiled LightGBM having less CPU usage due to higher synchronization/overhead costs (explaining why it is slower) while Visual Studio compiled LightGBM would keep the cores "ready" for training (but both behaviors are unexplained so far).
The text was updated successfully, but these errors were encountered: