Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(druid): Delete obsolete Druid NoSQL slice parameters #24737

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Jul 19, 2023

SUMMARY

This PR addresses an issue raised by @padbk in #24581 (comment) related to sending an alert/revert via a CSV. The actual error was rather cryptic:

Error: Failed generating csv HTTP Error 400: BAD REQUEST

Specifically the RESTful /api/v1/chart/{pk}/data/?format=csv API endpoint was failing with the following error,

{
  "message": "Request is incorrect: {'queries': {0: {'extras': {'having_druid': ['Unknown field.']}}}}"
}

which turned out to be related to #23997, i.e., the referenced PR (which I authored) didn't remove the now obsolete parameters from the slices table. This PR remedies said issue.

I used Airbnb's vast corpus (consisting of ~ 150k charts) to identify if/where the following form-data key were persisted to the database:

  • druid_time_origin
  • having_druid
  • having_filters
  • granularity

I wasn't able to find having_filters anywhere, but it was somewhat 😢 to see that the druid_time_origin and granularity fields were actually present in three separate places:

  1. In the slices.params column
  2. In the form_data field of the slices.query_context column.
  3. In the queries field of the slices.query_context column.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI and added/tested the DB migration. After running the migration I then confirmed that that I was successfully able to generate a report sent as a CSV.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided << 5 minutes at Airbnb with ~ 150k charts
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley changed the title fix: Delete obsolete Druid NoSQL slice parameters fix(druid): Delete obsolete Druid NoSQL slice parameters Jul 19, 2023
@codecov
Copy link

codecov bot commented Jul 19, 2023

Codecov Report

Merging #24737 (72435e7) into master (aa01b51) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 72435e7 differs from pull request most recent head 5ad9106. Consider uploading reports for the commit 5ad9106 to get more accurate results

@@           Coverage Diff           @@
##           master   #24737   +/-   ##
=======================================
  Coverage   68.89%   68.89%           
=======================================
  Files        1901     1901           
  Lines       73927    73927           
  Branches     8183     8183           
=======================================
  Hits        50932    50932           
  Misses      20874    20874           
  Partials     2121     2121           
Flag Coverage Δ
hive 54.15% <100.00%> (ø)
mysql 79.21% <100.00%> (ø)
postgres 79.29% <100.00%> (ø)
presto 54.05% <100.00%> (ø)
python 83.31% <100.00%> (ø)
sqlite 77.88% <100.00%> (ø)
unit 54.87% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/row_level_security/schemas.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@john-bodley john-bodley force-pushed the john-bodley--fix-cleanup-legacy-druid-nosql-chart-payload branch 2 times, most recently from f8ae048 to 007baa1 Compare July 19, 2023 01:43
@john-bodley john-bodley marked this pull request as ready for review July 19, 2023 02:23
@john-bodley john-bodley requested a review from a team as a code owner July 19, 2023 02:23
@john-bodley john-bodley added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Jul 19, 2023
if updated:
slc.params = json.dumps(params)
except Exception:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a log here to see the exception

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hughhhh good idea.

if updated:
slc.query_context = json.dumps(query_context)
except Exception:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see as comment above

@john-bodley john-bodley force-pushed the john-bodley--fix-cleanup-legacy-druid-nosql-chart-payload branch 2 times, most recently from bbfe5f3 to 52398c7 Compare July 19, 2023 03:38
@john-bodley
Copy link
Member Author

john-bodley commented Jul 19, 2023

@hughhhh I've addressed your comments. The exceptions are logged in the form:

ERROR [root] Unable to parse query context for slice 123
Traceback (most recent call last):
  File "/srv/superset-internal/apache-superset/superset/migrations/versions/abc.py", line 75, in upgrade
    query_context = json.loads(slc.query_context)
  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 53523 (char 53522)

@john-bodley john-bodley force-pushed the john-bodley--fix-cleanup-legacy-druid-nosql-chart-payload branch from 52398c7 to 5ad9106 Compare July 19, 2023 04:46
Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@john-bodley john-bodley merged commit 4c5ada4 into apache:master Jul 19, 2023
31 checks passed
michael-s-molina pushed a commit that referenced this pull request Jul 26, 2023
@mistercrunch mistercrunch added 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants