Skip to content

Commit

Permalink
Implement typed and type erased tensor.
Browse files Browse the repository at this point in the history
* Use typed tensor for storing meta info like base margin.
* Extend the array interface handler to multi-dim.

Implement a general array view.

* Replace existing matrix and vector view.

lint.

Remove const too.

Doc/Test.

Include.

Use it in AUC.

Win build.

Use int32_t.

Use integral.

force the same type.

Use constexpr for old nvcc.

Test for empty tensor.

Rename to view.

Format.

Better document and perf.

Address reviewer's comment.

tidy.

Implement a general array view.

* Replace existing matrix and vector view.

lint.

Remove const too.

Doc/Test.

Include.

Use it in AUC.

Win build.

Use int32_t.

Use integral.

force the same type.

Use constexpr for old nvcc.

Test for empty tensor.

Rename to view.

Format.

Prototype.

Move string view.

Compile on CPU.

Some fixes for GPU compilation.

Array interface.

Use it in file iter.

Cleanup.

itemsize.

Documents.

Cache the pointer.

port.

cuda compilation.

Start working on ari.

Add clang-format config. (dmlc#7383)

Generated using `clang-format -style=google -dump-config > .clang-format`, with column
width changed from 80 to 100 to be consistent with existing cpplint check.

Define shape and stride.

Convert some uses.

Prototype for copy tensor info.

proto unravel

Indexer.

Unravel.

Cleanup.

Fixes.

fixe.s

WAR.

as column vector.

Convert some code.

some more.

some more.

Compile.

Ensure column vector from the beginning.

IO.

Add code comments.

Test for trivial dimension.

Start CPU implementation.

Refactor.

Dispatch.

Compile.
  • Loading branch information
trivialfis committed Nov 5, 2021
1 parent c968217 commit 2fb7c8a
Show file tree
Hide file tree
Showing 28 changed files with 980 additions and 484 deletions.
31 changes: 16 additions & 15 deletions include/xgboost/c_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ XGB_DLL int XGDMatrixCreateFromCudaArrayInterface(char const *data,
char const* json_config,
DMatrixHandle *out);

/*
/**
* ========================== Begin data callback APIs =========================
*
* Short notes for data callback
Expand All @@ -249,9 +249,9 @@ XGB_DLL int XGDMatrixCreateFromCudaArrayInterface(char const *data,
* used by JVM packages. It uses `XGBoostBatchCSR` to accept batches for CSR formated
* input, and concatenate them into 1 final big CSR. The related functions are:
*
* - XGBCallbackSetData
* - XGBCallbackDataIterNext
* - XGDMatrixCreateFromDataIter
* - \ref XGBCallbackSetData
* - \ref XGBCallbackDataIterNext
* - \ref XGDMatrixCreateFromDataIter
*
* Another set is used by external data iterator. It accept foreign data iterators as
* callbacks. There are 2 different senarios where users might want to pass in callbacks
Expand All @@ -267,17 +267,17 @@ XGB_DLL int XGDMatrixCreateFromCudaArrayInterface(char const *data,
* Related functions are:
*
* # Factory functions
* - `XGDMatrixCreateFromCallback` for external memory
* - `XGDeviceQuantileDMatrixCreateFromCallback` for quantile DMatrix
* - \ref XGDMatrixCreateFromCallback for external memory
* - \ref XGDeviceQuantileDMatrixCreateFromCallback for quantile DMatrix
*
* # Proxy that callers can use to pass data to XGBoost
* - XGProxyDMatrixCreate
* - XGDMatrixCallbackNext
* - DataIterResetCallback
* - XGProxyDMatrixSetDataCudaArrayInterface
* - XGProxyDMatrixSetDataCudaColumnar
* - XGProxyDMatrixSetDataDense
* - XGProxyDMatrixSetDataCSR
* - \ref XGProxyDMatrixCreate
* - \ref XGDMatrixCallbackNext
* - \ref DataIterResetCallback
* - \ref XGProxyDMatrixSetDataCudaArrayInterface
* - \ref XGProxyDMatrixSetDataCudaColumnar
* - \ref XGProxyDMatrixSetDataDense
* - \ref XGProxyDMatrixSetDataCSR
* - ... (data setters)
*/

Expand Down Expand Up @@ -402,7 +402,7 @@ XGB_EXTERN_C typedef void DataIterResetCallback(DataIterHandle handle); // NOLIN
* - cache_prefix: The path of cache file, caller must initialize all the directories in this path.
* - nthread (optional): Number of threads used for initializing DMatrix.
*
* \param out The created external memory DMatrix
* \param[out] out The created external memory DMatrix
*
* \return 0 when success, -1 when failure happens
*/
Expand Down Expand Up @@ -596,7 +596,8 @@ XGB_DLL int XGDMatrixSetUIntInfo(DMatrixHandle handle,
* char const* feat_names [] {"feat_0", "feat_1"};
* XGDMatrixSetStrFeatureInfo(handle, "feature_name", feat_names, 2);
*
* // i for integer, q for quantitive. Similarly "int" and "float" are also recognized.
* // i for integer, q for quantitive, c for categorical. Similarly "int" and "float"
* // are also recognized.
* char const* feat_types [] {"i", "q"};
* XGDMatrixSetStrFeatureInfo(handle, "feature_type", feat_types, 2);
*
Expand Down
16 changes: 11 additions & 5 deletions include/xgboost/data.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,14 @@
#include <dmlc/data.h>
#include <dmlc/serializer.h>
#include <xgboost/base.h>
#include <xgboost/span.h>
#include <xgboost/host_device_vector.h>
#include <xgboost/linalg.h>
#include <xgboost/span.h>
#include <xgboost/string_view.h>

#include <algorithm>
#include <memory>
#include <numeric>
#include <algorithm>
#include <string>
#include <utility>
#include <vector>
Expand Down Expand Up @@ -67,7 +69,7 @@ class MetaInfo {
* if specified, xgboost will start from this init margin
* can be used to specify initial prediction to boost from.
*/
HostDeviceVector<bst_float> base_margin_; // NOLINT
linalg::Tensor<float, 3> base_margin_; // NOLINT
/*!
* \brief lower bound of the label, to be used for survival analysis (censored regression)
*/
Expand Down Expand Up @@ -157,7 +159,8 @@ class MetaInfo {
*
* Right now only 1 column is permitted.
*/
void SetInfo(const char* key, std::string const& interface_str);

void SetInfo(StringView key, StringView interface_str);

void GetInfo(char const* key, bst_ulong* out_len, DataType dtype,
const void** out_dptr) const;
Expand All @@ -179,6 +182,9 @@ class MetaInfo {
void Extend(MetaInfo const& that, bool accumulate_rows, bool check_column);

private:
void SetInfoFromHost(StringView key, StringView interface_str);
void SetInfoFromCUDA(const char* key, std::string const& interface_str);

/*! \brief argsort of labels */
mutable std::vector<size_t> label_order_cache_;
};
Expand Down Expand Up @@ -477,7 +483,7 @@ class DMatrix {
this->Info().SetInfo(key, dptr, dtype, num);
}
virtual void SetInfo(const char* key, std::string const& interface_str) {
this->Info().SetInfo(key, interface_str);
this->Info().SetInfo(key, StringView{interface_str});
}
/*! \brief meta information of the dataset */
virtual const MetaInfo& Info() const = 0;
Expand Down
46 changes: 5 additions & 41 deletions include/xgboost/json.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,17 @@
#ifndef XGBOOST_JSON_H_
#define XGBOOST_JSON_H_

#include <xgboost/intrusive_ptr.h>
#include <xgboost/logging.h>
#include <xgboost/parameter.h>
#include <xgboost/intrusive_ptr.h>
#include <xgboost/string_view.h>

#include <functional>
#include <map>
#include <memory>
#include <vector>
#include <functional>
#include <utility>
#include <string>
#include <utility>
#include <vector>

namespace xgboost {

Expand Down Expand Up @@ -298,43 +299,6 @@ class JsonBoolean : public Value {
}
};

struct StringView {
private:
using CharT = char; // unsigned char
using Traits = std::char_traits<CharT>;
CharT const* str_;
size_t size_;

public:
StringView() = default;
StringView(CharT const* str, size_t size) : str_{str}, size_{size} {}
explicit StringView(std::string const& str): str_{str.c_str()}, size_{str.size()} {}
explicit StringView(CharT const* str) : str_{str}, size_{Traits::length(str)} {}

CharT const& operator[](size_t p) const { return str_[p]; }
CharT const& at(size_t p) const { // NOLINT
CHECK_LT(p, size_);
return str_[p];
}
size_t size() const { return size_; } // NOLINT
// Copies a portion of string. Since we don't have std::from_chars and friends here, so
// copying substring is necessary for appending `\0`. It's not too bad since string by
// default has small vector optimization, which is enabled by most if not all modern
// compilers for numeric values.
std::string substr(size_t beg, size_t n) const { // NOLINT
CHECK_LE(beg, size_);
return std::string {str_ + beg, n < (size_ - beg) ? n : (size_ - beg)};
}
CharT const* c_str() const { return str_; } // NOLINT

CharT const* cbegin() const { return str_; } // NOLINT
CharT const* cend() const { return str_ + size(); } // NOLINT
CharT const* begin() const { return str_; } // NOLINT
CharT const* end() const { return str_ + size(); } // NOLINT
};

std::ostream &operator<<(std::ostream &os, StringView const v);

/*!
* \brief Data structure representing JSON format.
*
Expand Down
Loading

0 comments on commit 2fb7c8a

Please sign in to comment.