Improved InitSampling function speed by 2.12 times #6410

RukhovichIV · 2020-11-19T13:38:07Z

This PR is connected with #6411
Discarding elements from generators takes up most of the working time in InitSampling.
Since stdlibc++ doesn't have any random engines with o(n) complexity (little-o), we only can optimize the number of discarded elements.
std::bernoulli_distribution requires 64-bit input, so in previous version we had to discard twice as much elements as now.
This little optimization gives us ~2.12 speed up of InitSampling time, which translates into up to 16% speed up of the whole training time when subsampling < 1
The quality remains the same:

Mortgage dataset
version	training time	RMSE
original	35.479	0.009271
optimized	30.538	0.009262

Santander dataset
version	training time, s	InitData time, s	init improvement, %	Log Loss
original	249.196	28.784	0.00	0.16607
optimized	239.298	17.173	67.61	0.16610
no discard	224.347	11.221	156.51	0.16613

Higgs dataset
version	training time, s	InitData time, s	init improvement, %	Log Loss
original	34.235	14.373	0.00	0.09339
optimized	29.201	8.316	72.84	0.09409

igor_rukhovich added 2 commits November 19, 2020 16:19

Improved InitSampling function speed by 2.12 times

3d8d5b0

Added explicit conversion

f797c7b

RukhovichIV mentioned this pull request Nov 19, 2020

Removed discard from InitSampling #6411

Closed

trivialfis self-assigned this Dec 3, 2020

RAMitchell approved these changes Dec 16, 2020

View reviewed changes

hcho3 approved these changes Dec 16, 2020

View reviewed changes

hcho3 merged commit 5c8ccf4 into dmlc:master Dec 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved InitSampling function speed by 2.12 times #6410

Improved InitSampling function speed by 2.12 times #6410

RukhovichIV commented Nov 19, 2020 •

edited

Loading

Improved InitSampling function speed by 2.12 times #6410

Improved InitSampling function speed by 2.12 times #6410

Conversation

RukhovichIV commented Nov 19, 2020 • edited Loading

RukhovichIV commented Nov 19, 2020 •

edited

Loading