Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Fast Histogram Algorithm #4011

Merged
merged 34 commits into from
Feb 5, 2019

Conversation

CodingCat
Copy link
Member

basically need to remove several implementation assumption in fast histogram algorithm

@hcho3 please help to review

I will test with our internal dataset

@CodingCat CodingCat changed the title Dist fast histogram Distributed Fast Histogram Algorithm Dec 19, 2018
src/tree/updater_quantile_hist.cc Show resolved Hide resolved
src/tree/updater_histmaker.cc Outdated Show resolved Hide resolved
src/tree/param.h Outdated Show resolved Hide resolved
src/common/hist_util.h Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Dec 30, 2018

Codecov Report

Merging #4011 into master will increase coverage by 3.49%.
The diff coverage is 34.21%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #4011      +/-   ##
============================================
+ Coverage     57.24%   60.73%   +3.49%     
============================================
  Files           190      130      -60     
  Lines         15045    11718    -3327     
  Branches        527        0     -527     
============================================
- Hits           8612     7117    -1495     
+ Misses         6176     4601    -1575     
+ Partials        257        0     -257
Impacted Files Coverage Δ Complexity Δ
src/learner.cc 25.96% <0%> (-0.22%) 0 <0> (ø)
src/tree/updater_histmaker.cc 2.91% <0%> (+0.01%) 0 <0> (ø) ⬇️
src/tree/updater_refresh.cc 98.76% <100%> (ø) 0 <0> (ø) ⬇️
src/common/hist_util.cc 42.9% <100%> (+0.17%) 0 <0> (ø) ⬇️
src/tree/updater_quantile_hist.h 48% <100%> (+2.16%) 0 <0> (ø) ⬇️
src/tree/updater_quantile_hist.cc 34.22% <38.88%> (-0.09%) 0 <0> (ø)
src/common/hist_util.h 78.84% <50%> (-2.41%) 0 <0> (ø)
src/linear/updater_shotgun.cc 91.07% <0%> (-2.6%) 0% <0%> (ø)
src/linear/updater_coordinate.cc 100% <0%> (ø) 0% <0%> (ø) ⬇️
tests/cpp/test_learner.cc 100% <0%> (ø) 0% <0%> (ø) ⬇️
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84c99f8...5af2a6a. Read the comment docs.

@CodingCat
Copy link
Member Author

ping? @RAMitchell @hcho3 @yanboliang

@RAMitchell
Copy link
Member

Native code looks good! Should there be a test here or are the Java tests enough?

@hcho3
Copy link
Collaborator

hcho3 commented Jan 4, 2019

@CodingCat I'm now back from winter vacation. I'll review once #3957 is merged.

@CodingCat
Copy link
Member Author

I am testing with our internal dataset while we get 1.5-2X speedup, I found that the training accuracy of fast-histogram is a bit lower than approx,

anyone see the same thing before?

@CodingCat
Copy link
Member Author

and when we set col_sample_bytree, fast histogram is even slower than approx, I am investigating if everything is fine in that part

@CodingCat CodingCat force-pushed the dist_fast_histogram branch 2 times, most recently from aee9bfc to 5fc66db Compare January 6, 2019 23:43
@troszok
Copy link

troszok commented Jan 7, 2019

I am testing with our internal dataset while we get 1.5-2X speedup, I found that the training accuracy of fast-histogram is a bit lower than approx, anyone see the same thing before?

Hi @CodingCat I would be very happy to do some tests on our datasets - is it possible to grab the packages for that PR from somewhere? Or should I recompile everything on my own?

@CodingCat
Copy link
Member Author

Thanks, @troszok

You can fetch the version with the distributed fast histo support with the approach in https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#refer-to-xgboost4j-spark-dependency (search XGBoost4J-Spark Snapshot Repo )

the version number is 0.82-SNAPSHOT

@CodingCat CodingCat force-pushed the dist_fast_histogram branch from 5fc66db to 853a758 Compare January 9, 2019 06:16
@CodingCat
Copy link
Member Author

@troszok any update about your test for accuracy?

@troszok
Copy link

troszok commented Jan 10, 2019

@troszok any update about your test for accuracy?
Hi @CodingCat,
i just updated the code to fetch the 0.82-snapshot and it seems to work. I will do some testing over the next couple of days. I will let you know.

@CodingCat
Copy link
Member Author

@hcho3 ping for review?

std::sort(new_features.begin(), new_features.end());

new_features.resize(static_cast<unsigned long>(n));
// std::sort(new_features.begin(), new_features.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confused by the old code here, was there any reason to sort just after we had shuffled the features?

Then could you explain why we need the ser/deser compared to what was happening before?

src/tree/updater_quantile_hist.cc Show resolved Hide resolved
@thvasilo
Copy link
Contributor

@Liuhaoge this is a discussion about a new upcoming feature in the codebase, asking questions here will not get you anywhere and is diverting from the main topic.

Once the feature has been merged I'd recommend asking questions about usage on the discussion board, we try to use Github for bug reporting and development discussions.

@trivialfis
Copy link
Member

@thvasilo Actually it might be reasonable to include some documentations about added features in this PR. :) Our documentation is not very thorough.

@thvasilo
Copy link
Contributor

@trivialfis Good point, we had that as a requirement for merging PR's that introduce new features in other code-bases I've worked on.

@Liuhaoge
Copy link

@CodingCat Did you establish an interface to use Monotonic Constrains in Scala? How to use that in distributed environment?

@thvasilo
Copy link
Contributor

thvasilo commented Jan 24, 2019

Hello @CodingCat, I tried running an experiment creating a local cluster with 6 workers today.

Using the approx tree method works as expected, but when I try running the same using the hist method no training occurs.

My configuration file (higgs.conf) looks like this:

data=/path/to/data
num_rounds=5
tree_method=hist # Changing this to approx works fine
verbosity=2
eval_train=1

And I use this to run the command:
./dmlc-core/tracker/dmlc-submit --cluster=local --num-workers=6 xgboost higgs.conf

Looking at the output seems like max_depth is set to 0, even when I explicitly set it in the configuration file:

[13:54:44] INFO: /home/tvas/xgboost-origin/src/learner.cc:215: Tree method is selected to be 'hist', which uses a single updater grow_quantile_histmaker.
[13:54:44] INFO: /home/tvas/xgboost-origin/src/cli_main.cc:198: Loading data: 6.30702 sec
[13:54:44] INFO: /home/tvas/xgboost-origin/src/cli_main.cc:205: boosting round 0, 2.38419e-07 sec elapsed
[13:54:49] INFO: /home/tvas/xgboost-origin/src/tree/updater_quantile_hist.cc:64: Generating gmat: 4.83453 sec
[13:54:50] INFO: /home/tvas/xgboost-origin/src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 0 pruned nodes, max_depth=0
[13:54:50] INFO: /home/tvas/xgboost-origin/src/tree/updater_quantile_hist.cc:216: 
InitData:          0.0142 ( 4.00%)
InitNewNode:       0.0000 ( 0.00%)
BuildHist:         0.2409 (67.94%)
EvaluateSplit:     0.0182 ( 5.15%)
ApplySplit:        0.0000 ( 0.00%)
========================================
Total:             0.3546
[13:54:50] INFO: /home/tvas/xgboost-origin/src/cli_main.cc:205: boosting round 1, 5.67189 sec elapsed
2019-01-24 13:54:50,530 INFO [13:54:50] [0]     train-rmse:0.500000

Edit: I guess this is related to #4078 , maybe I missed something there.

@CodingCat CodingCat force-pushed the dist_fast_histogram branch from d42ac2c to d3b312a Compare January 30, 2019 17:50
@CodingCat
Copy link
Member Author

rebased for running with fixed travis

@CodingCat
Copy link
Member Author

@trivialfis doc is updated

@hcho3
Copy link
Collaborator

hcho3 commented Jan 31, 2019

@RAMitchell @trivialfis I'm seeing a memory error in the GPU test. Any idea why?

tests/python-gpu/test_gpu_linear.py::TestGPULinear::test_gpu_coordinate Training on dataset: Boston
Using parameters: {'n_gpus': -1, 'eval_metric': 'rmse', 'objective': 'reg:linear', 'nthread': 2, 'coordinate_selection': 'cyclic', 'eta': 0.5, 'updater': 'coord_descent', 'top_k': 10, 'alpha': 0.005, 'lambda': 0.005, 'tolerance': 1e-05, 'booster': 'gblinear'}
Training on dataset: Digits
Using parameters: {'n_gpus': -1, 'num_class': 10, 'eval_metric': 'merror', 'objective': 'multi:softmax', 'nthread': 2, 'coordinate_selection': 'cyclic', 'eta': 0.5, 'updater': 'coord_descent', 'top_k': 10, 'alpha': 0.005, 'lambda': 0.005, 'tolerance': 1e-05, 'booster': 'gblinear'}

terminate called after throwing an instance of 'dmlc::Error'
  what():  [23:09:37] /workspace/include/xgboost/./../../src/common/common.h:41: /workspace/src/common/host_device_vector.cu: 140: an illegal memory access was encountered

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/.local/lib/python2.7/site-packages/xgboost-0.81-py2.7.egg/xgboost/./lib/libxgboost.so(dmlc::StackTrace(unsigned long)+0x47) [0x7f8a3d612717]
[bt] (1) /home/ubuntu/.local/lib/python2.7/site-packages/xgboost-0.81-py2.7.egg/xgboost/./lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1d) [0x7f8a3d612b7d]
[bt] (2) /home/ubuntu/.local/lib/python2.7/site-packages/xgboost-0.81-py2.7.egg/xgboost/./lib/libxgboost.so(dh::ThrowOnCudaError(cudaError, char const*, int)+0x123) [0x7f8a3d7df8b3]
[bt] (3) /home/ubuntu/.local/lib/python2.7/site-packages/xgboost-0.81-py2.7.egg/xgboost/./lib/libxgboost.so(xgboost::HostDeviceVectorImpl<int>::DeviceShard::LazySyncDevice(xgboost::GPUAccess)+0x153) [0x7f8a3d840713]
[bt] (4) /home/ubuntu/.local/lib/python2.7/site-packages/xgboost-0.81-py2.7.egg/xgboost/./lib/libxgboost.so(xgboost::HostDeviceVectorImpl<int>::LazySyncDevice(int, xgboost::GPUAccess)+0xd3) [0x7f8a3d840e53]
[bt] (5) /home/ubuntu/.local/lib/python2.7/site-packages/xgboost-0.81-py2.7.egg/xgboost/./lib/libxgboost.so(xgboost::HostDeviceVectorImpl<int>::DeviceSpan(int)+0x5f) [0x7f8a3d840fcf]
[bt] (6) /home/ubuntu/.local/lib/python2.7/site-packages/xgboost-0.81-py2.7.egg/xgboost/./lib/libxgboost.so(xgboost::HostDeviceVector<int>::DeviceSpan(int)+0xc) [0x7f8a3d8411ac]
[bt] (7) /home/ubuntu/.local/lib/python2.7/site-packages/xgboost-0.81-py2.7.egg/xgboost/./lib/libxgboost.so(+0x30ee27) [0x7f8a3d7f1e27]
[bt] (8) /home/ubuntu/.local/lib/python2.7/site-packages/xgboost-0.81-py2.7.egg/xgboost/./lib/libxgboost.so(xgboost::obj::SoftmaxMultiClassObj::GetGradient(xgboost::HostDeviceVector<float> const&, xgboost::MetaInfo const&, int, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*)+0x849) [0x7f8a3d7f51c9]
[bt] (9) /home/ubuntu/.local/lib/python2.7/site-packages/xgboost-0.81-py2.7.egg/xgboost/./lib/libxgboost.so(xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x372) [0x7f8a3d6eb052]

@CodingCat
Copy link
Member Author

CodingCat commented Jan 31, 2019

@hcho3 you have time to review?

There is a following PR based on this to separate depthwidth and lossguide

@hcho3
Copy link
Collaborator

hcho3 commented Jan 31, 2019

@CodingCat Okay, I'll take a quick look today

@CodingCat
Copy link
Member Author

@hcho3 any update?

Copy link
Collaborator

@hcho3 hcho3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodingCat LGTM, as far as I'm aware of. I really appreciate your work for updating the fast hist algorithm.

@@ -105,6 +105,7 @@ class QuantileHistMaker: public TreeUpdater {
} else {
hist_builder_.BuildHist(gpair, row_indices, gmat, hist);
}
this->histred_.Allreduce(hist.begin, hist_builder_.GetNumBins());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pleasantly surprised that distributed implementation is this succinct and concise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the all reduce interface is easy to use and we only need to make some additional care to substract tricks in distributed mode

@CodingCat CodingCat merged commit ae3bb9c into dmlc:master Feb 5, 2019
@lock lock bot locked as resolved and limited conversation to collaborators May 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants