[DISCUSS] Support kDLBool type #75

alonre24 · 2021-06-03T12:11:04Z

To represent tensors of type bool - is it possible adding to the DLDataTypeCode enum another type (kDLBool)? That can be very useful, as some well known machine learning platforms (TF, torch, Onnxruntime) support models whose inputs/outputs are tensors of type bool. Thanks!

The text was updated successfully, but these errors were encountered:

tqchen · 2021-06-03T14:10:44Z

bool is encoded as UInt(bits=1, lanes=1) type. One thing that is indeed true is that we have not clearly specified how to store the sub-byte types, in this case they are stored as the byte

alonre24 · 2021-06-03T14:23:47Z

As I understand, the size of bool element is usually one byte, so representing it with bits=1 seems not appropriate. On the other hand, using UInt(bits=8, lanes=1) to represent bool doesn't allow distinguish between bool and uint8 types. Therefore, I suggest adding bool to DLDataTypeCode - both for having the flexibility of the desired bool representation (1 or 8 bits), and for not overlapping with existing types.

tqchen · 2021-06-03T14:35:14Z

Thanks @alonre24 On compiler projects like LLVM, uint(1) is used to represent the bool type(think it as a register type). That does not preclude us to think of a storage type of the element type. In this particular case, I believe specifying the storage type clearly for sub-byte types would be sufficient.

lantiga · 2021-06-03T16:26:27Z

Thank you @tqchen, there seems to be different ways in which the different frameworks decide to implement bool. It is not guaranteed to be uint(1), for instance in PyTorch is uint(8), which makes it undistinguishable from a tensor with a uint(8) dtype.
There are talks about creating a bit tensor in PyTorch to make bool sub-byte, but at the end of the day one has no guarantees there.

I'm of the opinion that having a separate DLDataTypeCode for bool and then using the bits field to describe whether that bool is held in 1 bit or byte in the buffer a would make a lot of sense for interoperability.

leofang · 2021-06-04T18:50:29Z

It is not guaranteed to be uint(1), for instance in PyTorch is uint(8), which makes it undistinguishable from a tensor with a uint(8) dtype.

Indeed, this issue would happen in MPI programs too (MPI_C_BOOL vs MPI_UNIT8_T).

tqchen · 2021-06-04T19:45:15Z

What I originally meant was to specify the sub-bit storage layout, which means uint(1) maps to the bool and stores as uint8 in DLPack.

When the bool was indeed stored as sub-byte in bitmask setting, it then becomes uint1x8(8 of 1 bits). So we can still distinguish the two using the current convention.

leofang · 2021-06-07T20:08:24Z

I just realized in CuPy we do not support exchanging bool arrays because of lack of kDLBool.

@tqchen I think this is a nontrivial interpretation that not everybody expects (nor was it documented). For example the ongoing NumPy PR did not support bool either (numpy/numpy#19083). AFAIK only PyTorch supports it but it's unclear to me where they learned this. I am +1 for adding kDLBool for better clarity.

cc: @rgommers @emcastillo @hameerabbasi @seberg

seberg · 2021-06-07T20:20:04Z

My view is that adding a new type for bool seems best (and clearest). Does PyTorch even really support it, or just export bool as a uint8 (losing the bool type information on round-tripping)?

tqchen · 2021-06-07T20:39:28Z

Thanks everyone for great discussion so far. Trying to summarize the thread.

A0: `i1` to represent bool

We can use uint(1) to represent the intend of the data type. Additionally, the specification says that any sub-byte data type would need to be stored in a full byte. This means uint1 would need to be stored as uint8(full byte) consistent with the current behavior.

For cases where the intention is to pack the bit-masking(say pack 8 bools into a byte). The data type can be represented as a vector type uint1x8, which can be stored as a full byte.

A1: Introduce a bool type

One of the natural extension to think about this is to introduce bool with bits=8. Note that the default printing behavior might not be too attractive here, since while we can say int8, bool8 does not makes too much sense as a dtype name.

Discussions

Both A0 and A1 should achieve the goal of supporting bool in DLPack.

A0 is likely more consistent with the low level compiler implementations, for example LLVM implements bool using i1
A1 makes the choice of bool more explicit, although we could also introduce a helper macro (IsBool(DLDataType) to make things explicit in the case of A0)

seberg · 2021-06-07T21:31:57Z

For cases where the intention is to pack the bit-masking(say pack 8 bools into a byte). The data type can be represented as a vector type uint1x8, which can be stored as a full byte.

I think this as storing exactly 8-tuples of bool, under the specifications? So a shape=1 [itemsize] would translate into shape=8 [bits] booleans/bits for uint1x8?
This means:

A shape of SHAPE [bits] has to be described as uint1xSHAPE, with SHAPE=15 you get 15 [bits] being uint1x15?
A multi-dimensional bit-array stored contiguously can't be described (since it would requite bit-sized strides that are not in the last dimensions/cannot be absorbed into lanes).

Also, someone might get the idea to store uint2 in a packed/unpacked format, at which point the type distinction between boolean and bit-sized integers becomes fuzzy. That might be an unnecessary "what if", but creating bool explicitly avoids it.

It seems to me that if there is a possibility of bit-sized stride support (e.g. in form of bit-arrays), creating the explicit bool dtype is a lot cleaner. If that will never happen, I still think it is nicer, but I doubt there are serious practical issues with using uint1 as bool.

I had the impression that the standard already explicitly supports bit-strides, since the header includes the line (in a comment):

size *= (t->dtype.bits * t->dtype.lanes + 7) / 8;

although, I guess there is a fair chance that nobody would do that and you can consider it undefined right now.

A0 is likely more consistent with the low level compiler implementations, for example LLVM implements bool using i1

Yes, but I don't understand that this is a reason for or against A0? The storage needs to be clearly defined as 1 or 0 (or possible value != 0 and value == 0), of course. But the type-safety aspect is unrelated to what the compiler uses behind the scenes?

tqchen · 2021-06-08T13:38:04Z

Thanks @seberg .

To followup on the conversation about the bitmasking. The particular comment of uint1x8 was with respect to the bit level case. In the most common cases we are not doing that, and uint1 plus the storage spec of sub-bit type would indeed resolve the issue(because all types padded to full byte to be stored). The same principle will likely applies to uint2 as well.

size *= (t->dtype.bits * t->dtype.lanes + 7) / 8; is a realization of the sub-byte storage convention.

To followup on the multi-dimensional bit-array. We would indeed require some form of logical to physical mapping in this case. Explicitly specifying a vector type uint1x8, or uint1x32 would corresponds to the physical storage plan of the data. Such plan is also usually easier to direct manipulate in the presence of SIMD. The logical case is a separate question.

In the case of A1, bool1 create some inconsistency with the current specification. Since the bool1 was intended as a bit array, while the sub-byte storage convention would indicate the other case(it should be stored as 1 byte). Of course as we discussed bit-array may be out of the scope, so not too relevant here. Just want to show some possible inconsistencies.

seberg · 2021-06-10T21:27:08Z

Yeah, you are right, I managed to mis-parse the code as rounding up the size rather than the itemsize :(. Although I guess many here likely didn't realize that sub-bit datatypes are standardized to be stored padded to full bytes :).

In the case of A1, bool1 create some inconsistency with the current specification. Since the bool1 was intended as a bit array

You are right. Do not use bool1 for bit-strided storage, then (by itself)! What you get is:

bool8 indicates a convention where True := value != 0 and False := value == 0
bool1 indicates a convention where True := value == 1 and False := value == 0 stored in 1 byte as per current DLPack standard
uint1 can be used for the set of values {0, 1} instead of meaning {False, True} (which could be surprising!)

(For what its worth, NumPy can't always guarantee bool1 style, as it, annoyingly, supports both in practice. But DLPack could be opinionated about bool1 vs. bool8. If you would switch to always using sub-byte strides, bool8 has to be specified as one of the two meanings.)

What about sub-byte strides?

I do agree that there is an inconsistency if you would use bool1 to implicitly indicate bit-strided (non byte-sized) storage, but not define uint2 to use this. But I do not understand that as an argument against introducing a new bool dtype. To me that is primarily a type safety (and clarity) issue – i.e. the last point.

I have heard the bit-mask requests for NumPy often enough to think that it may be good to anticipate it (not necessarily act on that right now, though). The future might for example see a flag (e.g. on the dtype field) to indicate true bit-sized storage (non byte-padded). That flag will allow both bool1 and uint2 (or even uint1) to be stored in a packed, sub-byte strided, manner.

tirthasheshpatel · 2022-03-08T09:05:38Z

@tqchen The discussion seems to have reached an impasse at the uint vs bool argument. #75 (comment) is a nice summary of the issue. Here is my opinion about the thread:

Using uint(1) or uint(8) as bool: This seems obscure to me as suggested in [DISCUSS] Support kDLBool type #75 (comment). Another argument against this is that there is no way to distinguish between uint8 and bool. But @tqchen noted that it is more consistent with the lower level implementation in compilers e.g. LLVM.
Using kDLBool: More explicit and easier to use. Clear distinction between uint8 and bool. Worst case, introduces some redundancy for low level compilers like LLVM (but that doesn't seem to be a problem (like introducing ambiguity), right?)

I would vote for an explicit kDLBool since its more explicit. I admit, I am not aware of the problems with low-level compiler projects like LLVM but I don't think that adding a kDLBool should cause a problem when exporting at such low levels. @tqchen Do you agree with this? If yes, we can proceed with merging #76. It would definitely be nice to have the bool data type in DLPack!

leofang · 2022-10-27T03:40:54Z

@tqchen @rgommers @seberg @kmaehashi @tirthasheshpatel @jakirkham @kkraus14

I'd like us to revisit this topic, and specially I'd like to request for adding kDLBool. Look at the NumPy/CuPy/PyTorch adoptions as of today, it is clearly that none of these libraries support exchanging bool tensors, and it has increasingly become a source of bug reports (see the above linked issues). Given that these are already the 3 major Array API adopting libraries, we should add the bool type to clear any ambiguity and unblock them.

tqchen · 2022-10-27T15:43:09Z

Thanks for rekindle this topic. After thinking a bit more on this topic. I also now agree that having kDLBool is going to be helpful.

I would assume the current state would call for something like

Bool(bits=8, lane=1)

to represent the current bool type in numpy/PyTorch?

seberg · 2022-10-27T15:51:18Z

Sounds good, it might be nice to review if anyone has expects True to be only the bit set, rather than anything != 0.

tqchen · 2022-11-09T15:36:34Z

seems there is general consensus on this. We can consider a PR to add such spec. perhaps @leofang can send a PR?

I think it makes sense to require true be the only bit set

leofang · 2022-11-09T15:40:29Z

Shoooot my apology @tqchen I completely missed the traffic here for some reason. Will do later today!

alonre24 mentioned this issue Jun 3, 2021

Add bool type to DLDataTypeCode enum #76

Closed

tqchen changed the title ~~Support kDLBool type~~ [DISCUSS] Support kDLBool type Jun 7, 2021

rgommers mentioned this issue Jun 24, 2021

Adds DLPack support pytorch/pytorch#57110

Closed

nv-dlasalle mentioned this issue Oct 8, 2021

[PyTorch][Bugfix] Use uint8 instead of bool in pytorch to be compatible with nightly version dmlc/dgl#3406

Merged

5 tasks

rgommers mentioned this issue Nov 5, 2021

DLPack no longer works on Boolean tensors after 1.10+ pytorch/pytorch#67081

Closed

conglongli mentioned this issue Mar 31, 2022

[BUG] Compressed Adam optimizers - RuntimeError: Bool type is not supported by dlpack microsoft/DeepSpeed#1859

Closed

juda mentioned this issue Jul 29, 2022

[TVM PyTorch Integration] libstdc++ CXX11 ABI Compatibility & boolean tensor support apache/tvm#12232

Merged

leofang mentioned this issue Oct 26, 2022

Unable to convert Pytorch Tensor to Cupy Array cupy/cupy#7144

Closed

leofang mentioned this issue Nov 10, 2022

Add kDLBool type #114

Merged

tqchen closed this as completed in #114 Dec 4, 2022

yosh-matsuda mentioned this issue Mar 26, 2023

Support bool ndarray wjakob/nanobind#150

Merged

rgommers mentioned this issue Apr 13, 2023

to_iterable data-apis/dataframe-api#109

Closed

cceyda mentioned this issue May 19, 2023

python_backend dlpack: Bool type is not supported triton-inference-server/server#5820

Closed

AlenkaF mentioned this issue Dec 13, 2023

GH-33984: [C++][Python] DLPack implementation for Arrow Arrays (producer) apache/arrow#38472

Merged

HydrogenSulfate mentioned this issue Sep 28, 2024

Add paddle backend NVIDIA/warp#318

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSS] Support kDLBool type #75

[DISCUSS] Support kDLBool type #75

alonre24 commented Jun 3, 2021

tqchen commented Jun 3, 2021 •

edited

Loading

alonre24 commented Jun 3, 2021

tqchen commented Jun 3, 2021

lantiga commented Jun 3, 2021

leofang commented Jun 4, 2021

tqchen commented Jun 4, 2021

leofang commented Jun 7, 2021

seberg commented Jun 7, 2021

tqchen commented Jun 7, 2021 •

edited

Loading

seberg commented Jun 7, 2021

tqchen commented Jun 8, 2021

seberg commented Jun 10, 2021

tirthasheshpatel commented Mar 8, 2022

leofang commented Oct 27, 2022

tqchen commented Oct 27, 2022

seberg commented Oct 27, 2022

tqchen commented Nov 9, 2022

leofang commented Nov 9, 2022

[DISCUSS] Support kDLBool type #75

[DISCUSS] Support kDLBool type #75

Comments

alonre24 commented Jun 3, 2021

tqchen commented Jun 3, 2021 • edited Loading

alonre24 commented Jun 3, 2021

tqchen commented Jun 3, 2021

lantiga commented Jun 3, 2021

leofang commented Jun 4, 2021

tqchen commented Jun 4, 2021

leofang commented Jun 7, 2021

seberg commented Jun 7, 2021

tqchen commented Jun 7, 2021 • edited Loading

A0: i1 to represent bool

A1: Introduce a bool type

Discussions

seberg commented Jun 7, 2021

tqchen commented Jun 8, 2021

seberg commented Jun 10, 2021

tirthasheshpatel commented Mar 8, 2022

leofang commented Oct 27, 2022

tqchen commented Oct 27, 2022

seberg commented Oct 27, 2022

tqchen commented Nov 9, 2022

leofang commented Nov 9, 2022

tqchen commented Jun 3, 2021 •

edited

Loading

tqchen commented Jun 7, 2021 •

edited

Loading

A0: `i1` to represent bool