Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ellpack page #4833

Merged
merged 11 commits into from
Sep 17, 2019
Merged

add ellpack page #4833

merged 11 commits into from
Sep 17, 2019

Conversation

rongou
Copy link
Contributor

@rongou rongou commented Sep 4, 2019

@RAMitchell @trivialfis @sriramch this PR pulls out the quantile sketches and binning compression into a separate EllpackPage. For now the whole ELLPackMatrix is still kept in GPU memory. Next step is to break apart the matrix and persist to disk.

Please take a look.

Part of #4357.

@rongou rongou force-pushed the ellpack-page branch 10 times, most recently from a90d099 to 9578e9d Compare September 10, 2019 16:04
@rongou
Copy link
Contributor Author

rongou commented Sep 12, 2019

@hcho3 Have you ever seen this error in the Travis python test?

E               urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)>

@rongou
Copy link
Contributor Author

rongou commented Sep 12, 2019

@hcho3 Never mind. Seems to have fixed itself.

@rongou rongou changed the title [WIP] add ellpack page add ellpack page Sep 12, 2019
@rongou rongou closed this Sep 12, 2019
@rongou rongou reopened this Sep 12, 2019
Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall. One issue though, could you please help adding null tests? I hope that XGBoost can deal with empty dataset one day ...

src/common/hist_util.cu Show resolved Hide resolved
class EllpackPageImpl {};

EllpackPage::EllpackPage(DMatrix* dmat) {
LOG(FATAL) << "Not implemented.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a personal preference. I would go with something more verbose like "Internal Error: XGBoost is not compiled with CUDA but EllpackPage is required" ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

EllpackPageImpl::EllpackPageImpl(DMatrix* dmat) : dmat_{dmat} {}

// Bin each input data entry, store the bin indices in compressed form.
template<typename std::enable_if<true, int>::type = 0>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any logic change in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the core logic, just some of the data handling.

@@ -0,0 +1,203 @@
/*!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor

@sriramch sriramch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be useful to run a quick performance test with some large data corpus just to make sure there aren't perf. regressions with this.

namespace xgboost {
namespace data {

extern template class SimpleBatchIteratorImpl<EllpackPage>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this required? doesn't the include in line 13 specialize this meta type with ellpackpage used later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is forcing the compiler to not instantiate the template since we know it's already done in simple_dmatrix. Perhaps a bit over optimization. Removed.

uint32_t n_features)
: device_id(_device_id),
page(page),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps prefix the formal argument page with an underscore _page

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -287,10 +289,23 @@ class SortedCSCPage : public SparsePage {
explicit SortedCSCPage(SparsePage page) : SparsePage(std::move(page)) {}
};

class EllpackPageImpl;
class EllpackPage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be more generalized so we can support multiple binned matrix types fairly easily. perhaps, make this an adapter, with a factory method to create the the underlying implementation:

struct binned_page {
   // common interface applicable to all pages
   virtual int num_bins() = 0;
   virtual int get_feature_bin(int bidx) = 0;
   virtual float get_feature_value(int fbin) = 0;
   // specifics for a page
   virtual float get_feature_value(ridx, fidx);
   // etc...
};

struct ellpack_page : binned_page {
   // realize interface
};

struct csr_page : binned_page {
   // realize interface
};

// etc...
static binned_page *create_binned_page(pertinent_data_after_quantile_generation);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably wait until we have another binned page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's keep this in mind but not for this PR.

Copy link
Contributor Author

@rongou rongou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sriramch I'll do some benchmarking.

@@ -287,10 +289,23 @@ class SortedCSCPage : public SparsePage {
explicit SortedCSCPage(SparsePage page) : SparsePage(std::move(page)) {}
};

class EllpackPageImpl;
class EllpackPage {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably wait until we have another binned page.

src/common/hist_util.cu Show resolved Hide resolved
class EllpackPageImpl {};

EllpackPage::EllpackPage(DMatrix* dmat) {
LOG(FATAL) << "Not implemented.";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

EllpackPageImpl::EllpackPageImpl(DMatrix* dmat) : dmat_{dmat} {}

// Bin each input data entry, store the bin indices in compressed form.
template<typename std::enable_if<true, int>::type = 0>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the core logic, just some of the data handling.

@@ -0,0 +1,203 @@
/*!
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

namespace xgboost {
namespace data {

extern template class SimpleBatchIteratorImpl<EllpackPage>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is forcing the compiler to not instantiate the template since we know it's already done in simple_dmatrix. Perhaps a bit over optimization. Removed.

uint32_t n_features)
: device_id(_device_id),
page(page),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice. Great to see parts of our monolithic updater_gpu_hist.cu code being extracted and becoming modular. There will be lots of opportunity to experiment with different matrix types in future.

@@ -287,10 +289,23 @@ class SortedCSCPage : public SparsePage {
explicit SortedCSCPage(SparsePage page) : SparsePage(std::move(page)) {}
};

class EllpackPageImpl;
class EllpackPage {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's keep this in mind but not for this PR.

@@ -148,11 +148,11 @@ struct GPUSketcher {
public:
DeviceShard(int device,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DeviceShard class is now redundant right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there are 4-5 places we still have a DeviceShard that's no longer necessary. I'll send out a followup PR to clean them up.

@@ -287,10 +289,23 @@ class SortedCSCPage : public SparsePage {
explicit SortedCSCPage(SparsePage page) : SparsePage(std::move(page)) {}
};

class EllpackPageImpl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain briefly why this class needs pimpl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly to get CUDA-specific implementation details out of the header, since DMatrix is used all over the place in CPU-only code.

@rongou
Copy link
Contributor Author

rongou commented Sep 16, 2019

@sriramch I ran tests/benchmark/benchmark_tree.py 10 times each and got these numbers:

  • master: mean - 6.745 seconds, std dev - 0.063
  • this pr: mean - 6.720 seconds, std dev - 0.026

So the performance seems about equal.

@trivialfis trivialfis merged commit 125bcec into dmlc:master Sep 17, 2019
@trivialfis
Copy link
Member

@rongou Thanks!

@rongou rongou deleted the ellpack-page branch September 19, 2019 23:09
@lock lock bot locked as resolved and limited conversation to collaborators Dec 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants