Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: to_datetime in Pandas 2 #24952

Merged
merged 2 commits into from
Aug 11, 2023
Merged

fix: to_datetime in Pandas 2 #24952

merged 2 commits into from
Aug 11, 2023

Conversation

betodealmeida
Copy link
Member

SUMMARY

The recent upgrade on Pandas from 1.5.3 to 2.0.3 broke some features. In this PR we fix a regression where a column with a full ISO timestamp (2017-07-01T00:00:00.000Z) can't be configured with the Python date format "%Y-%m-%d" because in Pandas 2.0 the format should match the whole string by default.

I changed the relevant calls to to_datetime to have exact=False, to preserve the behavior from Pandas 1.5.3.

One additional problem is that we had errors="coerce" in the call, which returns NaN if something goes wrong. I changed to to "raise", since I think it's a better experience. Otherwise the user has no idea of why their data is missing.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Added a test covering the regression.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@betodealmeida betodealmeida changed the title fix: to_datetime in Pandas 2 fix: to_datetime in Pandas 2 Aug 10, 2023
@codecov
Copy link

codecov bot commented Aug 10, 2023

Codecov Report

Merging #24952 (b497b6e) into master (ce65a3b) will decrease coverage by 10.53%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           master   #24952       +/-   ##
===========================================
- Coverage   69.03%   58.51%   -10.53%     
===========================================
  Files        1905     1905               
  Lines       74136    74136               
  Branches     8212     8212               
===========================================
- Hits        51181    43380     -7801     
- Misses      20832    28633     +7801     
  Partials     2123     2123               
Flag Coverage Δ
hive 54.18% <ø> (ø)
mysql ?
postgres ?
presto 54.08% <ø> (ø)
python 61.36% <ø> (-22.01%) ⬇️
sqlite ?
unit 55.08% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
superset/utils/core.py 68.60% <ø> (-21.90%) ⬇️

... and 293 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@john-bodley john-bodley added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Aug 11, 2023
2.0.3 and the behavior of ``pd.to_datetime`` changed.
"""
df = pd.DataFrame({"__time": ["2017-07-01T00:00:00.000Z"]})
assert (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betodealmeida why is this check needed? As expected there's nothing up Pandas's sleeve when you create the DataFrame.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed, it's just to show what's in the dataframe if anyone's reading at this test. I'm happy to remove it.

Copy link
Member

@john-bodley john-bodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @betodealmeida for the fix. Apart from the possibly unnecessary assert this LGTM.

@betodealmeida
Copy link
Member Author

Thanks @betodealmeida for the fix. Apart from the possibly unnecessary assert this LGTM.

"possibly"? You mean "totally". 😆

@betodealmeida betodealmeida added the merge-if-green If approved and tests are green, please go ahead and merge it for me label Aug 11, 2023
@betodealmeida betodealmeida merged commit 41ca4a0 into master Aug 11, 2023
54 checks passed
@michael-s-molina michael-s-molina removed the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Aug 11, 2023
@michael-s-molina
Copy link
Member

michael-s-molina commented Aug 11, 2023

@john-bodley I removed the 3.0 label because the Pandas upgrade is not in 3.0.

sadpandajoe pushed a commit to preset-io/superset that referenced this pull request Aug 11, 2023
@sadpandajoe
Copy link
Member

🏷️ preset:2023.31

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024
@mistercrunch mistercrunch deleted the sc_73447 branch March 26, 2024 18:04
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels merge-if-green If approved and tests are green, please go ahead and merge it for me size/M 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants