You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pymapd currently does not accept categorical columns from pandas, and issues an appropriate error message.
So this is a feature enhancement request.... would expect that categorical column type matches intent of OmniSci dictionary and could always be held within one.
Current behavior:
Unsupported type error
Expected behavior:
Pandas categorical columns are mapped to text encoded dictionary. This could either be done on the general case, or further optimized so that dictionary size is optimized.
The text was updated successfully, but these errors were encountered:
I think at a higher level, the question might be "Is there any reason not to always use text encoding dict"? I'd argue that for almost all cases, especially ones where users don't specify, there's probably not much harm in dictionary-encoding all string types.
I'm not as certain about trying to optimize the column widths. On the one hand, it's probably what a user wants, but on the other if the user had a preference they should build the table definition themselves. It also builds in more logic to maintain.
It could be the case that we could add a dry_run argument showing what the table definition might be. Then, users could alter/submit the table definition themselves with optimized settings.
Agreed, but the first issue here is that categorical columns types are rejected (not that they are treated as text, encoded or not). So in this particular case, we already have user intent/metadata telling us that the column represents something categorical which has already been dictionary-encoded in the source. OmniSci already has a quasi-equivalent concept. I'm wondering if they aren't equivalent-enough to add the re-encoding to dict, even if in-transition in thrift the concept doesn't exist.
Pymapd currently does not accept categorical columns from pandas, and issues an appropriate error message.
So this is a feature enhancement request.... would expect that categorical column type matches intent of OmniSci dictionary and could always be held within one.
Current behavior:
Unsupported type error
Expected behavior:
Pandas categorical columns are mapped to text encoded dictionary. This could either be done on the general case, or further optimized so that dictionary size is optimized.
The text was updated successfully, but these errors were encountered: