-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Added GDF_BOOL type to cuda code #817
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things I'm missing:
- All cudf code which uses GDF_INT8 but really uses it as boolean should be converted to use GDF_BOOL.
- Doxygen comments regarding functions taking or producing columns of booleans (and I don't mean
valid
pseudo-columns, since they're bit-packed) should be given a once-over to check if they say the input or output has type GDF_BOOL.
Bottom line: I would "request changes", but I don't have the official capacity for that.
@@ -27,6 +27,7 @@ typedef enum { | |||
GDF_TIMESTAMP, /**< Exact timestamp encoded with int64 since UNIX epoch (Default unit millisecond) */ | |||
GDF_CATEGORY, | |||
GDF_STRING, | |||
GDF_BOOL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, where's the typedef int8_t gdf_bool
? Or rather, for consistency with the (problematic) typedefs we have now, typedef bool gdf_bool
?
Also - Please add a comment explaining the semantics of this type, i.e. the fact that 0 means false and anything else means true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we would want all GDF_INT8 code to be referenced as GDF_BOOL? While the datatype might be the same, conceptually they are different.
"the fact that 0 means false and anything else means true." has always been a C standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why we would want all GDF_INT8 code to be referenced as GDF_BOOL? While the datatype might be the same, conceptually they are different.
That's not what @eyalroz was suggesting. Instead he was just saying to add a typedef int8_t gdf_bool
and then use that typedef in the wrapper
as
using cudf_bool = detail::wrapper<gdf_bool, GDF_BOOL>;
"the fact that 0 means false and anything else means true." has always been a C standard
While this is true, all of the operator
s for wrapper
don't behave this way. Thus, there's a few options:
- Add specializations for the pertinent
operator::wrapper
forgdf_bool
such that 0 means false and not zero means true (this is the most work) - Require that for
gdf_bool
that 0 means false and 1 means true, and all other values are invalid. (Could be done viaassert
or just trusting the user?) - Use
typedef bool gdf_bool
which automatically enforces 2., but is problematic because of the machine dependence onsizeof(bool)
. - Add a
normalize
function similar in concept tounwrap
in that for all types other thangdf_bool
, it's just a no-op, but forgdf_bool
types, it checks the value of thegdf_bool
, if 0 it stays 0, if not zero, it changes it to1
. You would then need to update all of the necessary wrappersoperator
s to then callnormalize
on the passed in arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll say I don't like the idea of keeping having cudf_foo
and gdf_foo
. The latter identifiers should all eventually become cudf_foo
, and we'll be stuck with a conflict.
Also - @jrhemstad , in option (1.) you mean implementing arithmetic operators, comparison operators etc? I don't think that is really too much work.
I'm in favor of (1.) , but possibly also of (4.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - @jrhemstad , in option (1.) you mean implementing arithmetic operators, comparison operators etc? I don't think that is really too much work.
Yes, it's not a lot of work, just more than 2, 3, or 4.
@@ -107,6 +107,7 @@ decltype(auto) type_dispatcher(gdf_dtype dtype, | |||
case GDF_DATE64: { return f.template operator()< date64 >(std::forward<Ts>(args)...); } | |||
case GDF_TIMESTAMP: { return f.template operator()< timestamp >(std::forward<Ts>(args)...); } | |||
case GDF_CATEGORY: { return f.template operator()< category >(std::forward<Ts>(args)...); } | |||
case GDF_BOOL: { return f.template operator()< gdf_bool >(std::forward<Ts>(args)...); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to @jrhemstad : Another reason to put some priority on accepting f
s which only have operator()
implemented for a subset of the types. Hint hint.
@@ -263,6 +263,9 @@ using date32 = detail::wrapper<gdf_date32, GDF_DATE32>; | |||
|
|||
using date64 = detail::wrapper<gdf_date64, GDF_DATE64>; | |||
|
|||
using gdf_bool = detail::wrapper<int8_t, GDF_BOOL>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't really like how some cudf types have using
statements here, and some don't. But while that's the rule - why should gdf_bool
be different than the integer and floating-point types? It does correspond to a C/C++ native type after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would be problematic to use bool
natively, because the size of bool
in C++ is implementation defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course it would! ... I was just being consistent with the other problematic typedef's:
typedef int gdf_size_type; /**< Limits the maximum size of a gdf_column to 2^31-1 */
typedef gdf_size_type gdf_index_type;
typedef unsigned char gdf_valid_type;
typedef long gdf_date64;
typedef int gdf_date32;
typedef int gdf_category;
typedef long gdf_timestamp;
int
and long
are also of implementation-defined size... I'd be all for
typedef int32_t gdf_size_type;
typedef gdf_size_type gdf_index_type;
typedef uint8_t gdf_valid_type; /* shouldn't this become uint32_t ? */
typedef int64_t gdf_date64;
typedef int32_t gdf_date32;
typedef int64_t gdf_timestamp;
typedef int32_t gdf_category;
typedef int8_t gdf_bool;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial conversation was around rather to simply do a typedef int8_t gdf_bool
or to use the type_dispatcher. We had decided to go down the type_dispatcher route, hence the is no typedef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BradReesWork : I don't quite understand. How does it make sense to have typedef int gdf_date32;
but not have typedef signed char gdf_bool
or typedef int8_t gdf_bool
?
@@ -49,7 +49,7 @@ struct WrappersTest : public ::testing::Test { | |||
}; | |||
|
|||
using Wrappers = ::testing::Types<cudf::category, cudf::timestamp, cudf::date32, | |||
cudf::date64>; | |||
cudf::date64, cudf::gdf_bool>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's it, no other tests? I wonder if this is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that more test would be beneficial. The current set of test does run through the most of the standard mathematical operations. What is missing are boolean operators. I will get those added
@eyalroz referring to this: https://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_promotion Specifically:
This was problematic because of the following:
This would throw a warning (error, since all warnings are errors) when Basically, this will fail for
Example here: https://wandbox.org/permlink/pU48n53H8F76g7SP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that operator==
will need a specialization for cudf::gdf_bool
, because it should be such that if both lhs
and rhs
are greater than zero, then they are equal.
There will need to be some changes to the unit tests to test this as well.
In theory shouldn't it be incorrect behavior to have any value other than a 1 or 0 in a GDF_BOOL column? I understand if comparing a bool to an int8 for example, but wouldn't that be a different function / operator? |
So, also all other comparison operators need to be changed as well. Also, we might want to consider a |
So we'd add a |
If by pass-through you mean it would do nothing and be |
@jrhemstad @BradReesWork What needs to happen with this PR? |
One thing I hope would be considered (whether rejected or accepted) is my |
I 100% support this as soon as #599 is completed. Otherwise, the CFFI will fail due to inability to parse the required |
In the mean time, we could use this hack:
or even simply
and cross our fingers (or require with CMake, or static_assert somewhere) that they're the same type. |
@BradReesWork , @harrism : Pinging you guys about this PR, because it has a slight (compatible) overlap with my PR #902 , and if it goes forward I'll base myself on what's gone in. |
Superseded by #1142 |
@jrhemstad is going to take over this PR. We'll move it to 0.7. |
Added a GDF_BOOL type.
The type was added as a wrapper type that resolves to an int8
This PR does not change any code to start using the new type
closes #667