-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Support using pandas nullable types #77
Conversation
@AndyBryson thanks for your PR, I will review your changes ASAP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @AndyBryson,
thanks for your PR 👍. Can you please rebase your sources with main
branch and satisfy our checklist?
Best
904f39c
to
163cee8
Compare
Hi @bednar. That's mostly done. There is no CHANGELOG.md, so I've not updated it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndyBryson I've add the CHANGELOG.md to main
branch.
You can add something like:
### Bugfix
1. [#77](https://github.com/InfluxCommunity/influxdb3-python/pull/77): Support using pandas nullable types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please rebase your sources instead of merging? This approach would make it easier for us to review your changes.
Hopefully that's all done now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for your PR, there are a few changes that sound reasonable from our POV:
try: | ||
from ...extras import pd | ||
|
||
def _not_nan(x): | ||
return not pd.isna(x) | ||
|
||
except ImportError: | ||
pd = None | ||
|
||
def _not_nan(x): | ||
return x == x | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplify to because the pandas should be present:
try: | |
from ...extras import pd | |
def _not_nan(x): | |
return not pd.isna(x) | |
except ImportError: | |
pd = None | |
def _not_nan(x): | |
return x == x | |
def _not_nan(x): | |
from ...extras import pd | |
return not pd.isna(x) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if issubclass(value.type, np.integer): | ||
field_value = f"{sep}{key_format}={{{val_format}}}i" | ||
elif issubclass(value.type, np.bool_): | ||
field_value = f'{sep}{key_format}={{{val_format}}}' | ||
elif issubclass(value.type, np.floating): | ||
if null_columns.iloc[index]: | ||
field_value = f"""{{"" if math.isnan({val_format}) else f"{sep}{key_format}={{{val_format}}}"}}""" | ||
field_value = f"""{{"" if pd.isna({val_format}) else f"{sep}{key_format}={{{val_format}}}i"}}""" | ||
else: | ||
field_value = f"{sep}{key_format}={{{val_format}}}i" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplify to:
if issubclass(value.type, np.integer) or issubclass(value.type, np.floating) or issubclass(value.type, np.bool_): # noqa: E501
suffix = 'i' if issubclass(value.type, np.integer) else ''
if null_columns.iloc[index]:
field_value = f"""{{"" if pd.isna({val_format}) else f"{sep}{key_format}={{{val_format}}}{suffix}"}}""" # noqa: E501
else:
field_value = f"{sep}{key_format}={{{val_format}}}{suffix}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Allow nullable types that aren't floats. This means that we can upload data frames that have int/bool columns that have empty cells and still maintain the correct type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for this PR 👍
LGTM 🚀
Closes #76
Proposed Changes
Use pandas to decide what is
NaN
to allow us to have nullable bools, ints and strings.Profiling
With a real world dataset, 10_000 lines, 156 columns
exiting method, no pandas null
1.86 s ± 26.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
new method, no pandas null
1.7 s ± 103 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
new method, with pandas null
1.51 s ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Checklist