Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write ELLPACK pages to disk #4879

Merged
merged 47 commits into from
Oct 23, 2019
Merged

Write ELLPACK pages to disk #4879

merged 47 commits into from
Oct 23, 2019

Conversation

rongou
Copy link
Contributor

@rongou rongou commented Sep 20, 2019

Whew, that took a while. The compressed ELLPACK buffers are written to disk now. I haven't changed the gpu_hist updater to handle multiple pages yet. Will work on that next.

Part of #4357.

@RAMitchell @trivialfis @sriramch

@trams
Copy link
Contributor

trams commented Sep 27, 2019

Thank you for your contribution.
Could you provide some context on what kind of issue are you trying to solve? I am a bit lost (granted I am not very familiar with GPU specifics and pull requests seems to address this side of xgboost)

@rongou
Copy link
Contributor Author

rongou commented Sep 27, 2019

As the title indicates, this is a work in progress, not quite ready for review yet. :) Probably still a couple commits away from being fully functional. In any case, it's part of the effort to support external memory on GPUs. See #4357.

@codecov-io
Copy link

codecov-io commented Oct 3, 2019

Codecov Report

Merging #4879 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #4879   +/-   ##
======================================
  Coverage    71.4%   71.4%           
======================================
  Files          11      11           
  Lines        2305    2305           
======================================
  Hits         1646    1646           
  Misses        659     659

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ab1df5...a124fbc. Read the comment docs.

@rongou rongou reopened this Oct 8, 2019
@rongou rongou changed the title [WIP] write ellpack pages to disk write ellpack pages to disk Oct 8, 2019
@rongou rongou changed the title write ellpack pages to disk Write ELLPACK pages to disk Oct 9, 2019
@hcho3
Copy link
Collaborator

hcho3 commented Oct 9, 2019

@rongou Thanks for your patience. I fixed issues in the CI in #4921 and now re-started the tests.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide a short introduction to the general flow (maybe in code comment) and the significant changed parts?

@rongou
Copy link
Contributor Author

rongou commented Oct 11, 2019

@trivialfis I added code comments. A few implementation notes:

  • In external memory mode, we first loop through all the CSR pages to build the quantile sketch and create an EllpackInfo to hold that information. Then we loop through the CSR pages again, this time compressing each CSR page into ELLPACK. The resulting compressed buffer is copied back to host and appended together. When the size reaches a threshold, the page is written to disk.
  • The EllpackInfo is actually kept in memory. When reading back the ellpack pages from disk, only the compressed buffers are read in and copied to GPU.
  • A few key methods are probably EllpackPageImpl::Push, EllpackPageImpl::InitDevice, EllpackPageSourceImpl constructor, and EllpackPageSourceImpl::WriteEllpackPages.

Hope this helps. Thanks!

@trivialfis
Copy link
Member

Will review soon. Thanks for the brief introduction.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the logic here yet. Will fetch the code so that I can review on my machine later. Thanks for the patience. One thing though, see #4237 . Could you take it into consideration? Not necessarily in this PR, but maybe some remarks?

Comment on lines 72 to 85
public:
bool Read(EllpackPage* page, dmlc::SeekStream* fi) override {
auto* impl = page->Impl();
if (!fi->Read(&impl->n_rows)) return false;
return fi->Read(&impl->idx_buffer);
}

bool Read(EllpackPage* page,
dmlc::SeekStream* fi,
const std::vector<bst_uint>& sorted_index_set) override {
auto* impl = page->Impl();
if (!fi->Read(&impl->n_rows)) return false;
return fi->Read(&page->Impl()->idx_buffer);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide a read impl or something similiar. We may change to format later, saves us a few minutes for reading the code. ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* \brief Push one instance into page
* \param inst an instance row
*/
void Push(const Inst &inst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where has this method been moved to? or, is this permanently deleted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not being used so I removed it. Can add it back if you guys think we should keep it.

if (!idx_buffer.empty()) {
offset = ::xgboost::common::detail::kPadding;
}
idx_buffer.reserve(idx_buffer.size() + buffer.size() - offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this buffer be reserved apriori before processing individual batches, as opposed to resizing them incrementally, since we know the size of the dataset from the meta info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know the size of the dataset, but we don't want to store the whole dataset in a single page. Since we are fed CSR pages one by one, I don't think we know beforehand how big the buffer is going to be.


gidx_buffer = {};
ba_.Allocate(device, &gidx_buffer, idx_buffer.size());
dh::CopyVectorToDeviceSpan(gidx_buffer, idx_buffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, in this version the sparse page iterator copies the entire compressed buffer into gpu and returns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future version then, will the idx_buffer go away and that the ellpack info also be persisted on disk in a separate page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd always need the idx_buffer for reading from files. We might need to persist the ellpack info if we want to use the same binning on a different dataset.

// Compress each CSR page to ELLPACK, and write the accumulated pages to disk.
void EllpackPageSourceImpl::WriteEllpackPages(DMatrix* dmat, const std::string& cache_info) const {
auto cinfo = ParseCacheInfo(cache_info, kPageType_);
SparsePageWriter<EllpackPage> writer(cinfo.name_shards, cinfo.format_shards, 6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can literal 6 be defined before to denote what it means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/data/ellpack_page_source.cu Show resolved Hide resolved
src/data/sparse_page_source.h Show resolved Hide resolved
src/data/data.cc Show resolved Hide resolved
src/data/sparse_page_source.h Show resolved Hide resolved

// Bin each input data entry, store the bin indices in compressed form.
template<typename std::enable_if<true, int>::type = 0>
template<typename std::enable_if<true, int>::type = 0>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually why is this line needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. :) I read this page: https://eli.thegreenplace.net/2014/sfinae-and-enable_if/, but still don't understand why it's here. Removed.

Copy link
Contributor

@sriramch sriramch Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that was added to prevent multiple symbol definition linker error. some of the tests includes the cuda file to test internal methods, and this kernel will be exposed as a global function with external linkage in multiple translation units. when they are put together to build a binary, linker will invariably complain.

making it inline or a function template will make the symbol visible only internally (internal linkage). i'm not sure if __global__ functions can be made inline. it could have been made static as well to force an internal linkage; not sure why a function template was chosen.

this is moot presently, as with the ellpack refactorization, the internal kernel is moved to a separate file that isn't explicitly/implicitly included within the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rongou @sriramch Sorry I added that line before. __global__ cannot be made inline so I went directly to template.

src/data/ellpack_page.cuh Show resolved Hide resolved
src/data/ellpack_page_source.cu Show resolved Hide resolved
src/data/sparse_page_writer.h Show resolved Hide resolved
DMatrix* dmat = DMatrix::Load(tmp_file + "#" + tmp_file + ".cache", true, false);

// Loop over the batches and assert the data is as expected
for (const auto& batch : dmat->GetBatches<EllpackPage>({0, 256, 64})) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the nitpick, what if the parameter is changed, it might happen between runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I saved BatchParam and added some checks.

src/data/sparse_page_writer.h Show resolved Hide resolved
src/data/sparse_page_source.h Show resolved Hide resolved
src/data/sparse_page_source.h Show resolved Hide resolved
Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. ;-) .

@trivialfis
Copy link
Member

@RAMitchell Do you need to review?

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Your implementation is consistent with the existing approach so I think we should merge.

Having said that, I am reminded by how convoluted our data code is and we still desperately need to reorganise some things. It's a big project for the future.

@trivialfis
Copy link
Member

Merging. @RAMitchell If you have some ideas on how to refactor please do share.

@trivialfis
Copy link
Member

@rongou Looking forward to your next PR. ;-)

@trivialfis trivialfis merged commit 5b1715d into dmlc:master Oct 23, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 24, 2020
@rongou rongou deleted the ellpack-source branch November 18, 2022 19:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants