Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(thumbnails): add support for user specific thumbs #22328

Merged
merged 21 commits into from
Dec 14, 2022

Conversation

villebro
Copy link
Member

@villebro villebro commented Dec 5, 2022

SUMMARY

This PR adds support for generating user-specific thumbnails. This is a typical requirement in environments where some form of user impersonation is being used, and sharing thumbnails across all users with access to the same dashboards/charts could leak sensitive data.

This PR does the following:

  • Renames ReportScheduleExecutor to ExecutorType so that it can be reused for thumbnails. Also moves the utils from the reports package to the tasks package, as it's shared with thumbnails now.
  • Adds a new package thumbnails to contain the thumbs-specific types etc. Also move the thumbnail task module here and deprecate the old one (it will still work, but will now emit a deprecation warning).
  • Adds a new executor type called CURRENT_USER which corresponds to the logged-in user that initiated the request. This user will be undefined for Alerts & Reports (=Celery has initiated those), but for thumbnails, this is the user that requested the thumbnail.
  • Adds the following config options:
    • THUMBNAIL_EXECUTE_AS - similar config as for Alerts & Reports
    • THUMBNAIL_DASHBOARD_DIGEST_FUNC: callback for generating custom digests for dashboards. This is handy if a deployment wants to use different hashing functions or use advanced logic for deciding if a thumbnail can be shared across a larger user pool or not.
    • THUMBNAIL_CHART_DIGEST_FUNC: callback for generating custom digests for charts

By default the digests will stay unchanged, as the new default value for THUMBNAIL_EXECUTE_AS = [ExecutorType.SELENIUM]. However, when setting it to [ExecutorType.CURRENT_USER], the username will be added to the unique_string prior to hashing to make it unique per user.

AFTER

This chart thumbnail was cached using the following query on a Trino database connection using user impersonation with the following virtual dataset:

select concat('Database: ', current_user) as user, 1 as num union all
select 'Jinja: {{ current_username() }}' as user, 2 as num

image

As can be seen, both the current_user (rendered by Trino) and {{ current_username() }} (rendered by Superset) both show the user as being v_brofeldt, i.e. not a service account.

BEFORE

When I changed the chart to reference a postgres database with basic auth using the username postgres and THUMBNAIL_EXECUTE_AS = [ExecutorType.SELENIUM] (=same as current behavior), the result was as follows:
image

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@villebro villebro changed the title Villebro/thumb selenium feat(thumbnails): add support for user specific thumbs Dec 5, 2022
@codecov
Copy link

codecov bot commented Dec 5, 2022

Codecov Report

Merging #22328 (74981e5) into master (1014a32) will increase coverage by 0.01%.
The diff coverage is 81.36%.

@@            Coverage Diff             @@
##           master   #22328      +/-   ##
==========================================
+ Coverage   66.89%   66.90%   +0.01%     
==========================================
  Files        1847     1850       +3     
  Lines       70611    70677      +66     
  Branches     7749     7749              
==========================================
+ Hits        47233    47285      +52     
- Misses      21362    21376      +14     
  Partials     2016     2016              
Flag Coverage Δ
hive 52.47% <36.64%> (-0.03%) ⬇️
mysql 77.97% <71.42%> (-0.01%) ⬇️
postgres 78.03% <71.42%> (-0.01%) ⬇️
presto 52.37% <36.64%> (-0.03%) ⬇️
python 81.23% <81.36%> (-0.01%) ⬇️
sqlite 76.50% <71.42%> (-0.01%) ⬇️
unit 50.92% <66.45%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/reports/commands/exceptions.py 98.36% <ø> (-0.03%) ⬇️
superset/reports/types.py 100.00% <ø> (ø)
superset/tasks/thumbnails.py 39.02% <22.22%> (-6.14%) ⬇️
superset/charts/api.py 86.19% <62.50%> (+0.42%) ⬆️
superset/models/slice.py 85.85% <62.50%> (-0.29%) ⬇️
superset/config.py 91.46% <80.00%> (-0.47%) ⬇️
superset/models/dashboard.py 76.61% <80.00%> (+0.11%) ⬆️
superset/dashboards/api.py 92.57% <83.33%> (+0.04%) ⬆️
superset/tasks/utils.py 91.48% <91.48%> (ø)
superset/thumbnails/digest.py 93.54% <93.54%> (ø)
... and 4 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@villebro villebro force-pushed the villebro/thumb-selenium branch 2 times, most recently from 8636996 to 48e90c3 Compare December 5, 2022 14:36
Comment on lines 344 to 347
@classmethod
def get(cls, id_: int) -> Slice:
session = db.session()
qry = session.query(Slice).filter_by(id=id_)
return qry.one_or_none()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a similar method in Dashboard that bypasses the base filter

Comment on lines 34 to 38
def get_executor(
executor_types: List[ExecutorType],
model: Union[Dashboard, ReportSchedule, Slice],
initiator: Optional[str] = None,
) -> Tuple[ExecutorType, str]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why this function returns the username as a string, and not the User object, is because this function will be called frequently (e.g. when getting all charts/dashboards). Since we only know the selenium username in the config, we would otherwise have to fetch it from the metastore, causing unnecessary round trips to the metastore.

@@ -0,0 +1,101 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is really the same as superset/tasks/thumbnails.py before, just updated with the new logic (fetching the executor and overriding the username etc)

AlertQueryError(),
),
(["gamma"], None, [ExecutorType.INITIATOR], AlertQueryError()),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really just one added test for the new INITIATOR case, other than that it's just updating the test cases to conform to the new sig of get_executor

@@ -0,0 +1,323 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Member Author

@villebro villebro Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mostly moved from the previous tests/unit_tests/reports/test_utils.py file, just updating to new return types + adding new relevant test cases.

@villebro villebro force-pushed the villebro/thumb-selenium branch 3 times, most recently from f419302 to 8a67d59 Compare December 5, 2022 17:07
@kamalkeshavani-aiinside
Copy link
Contributor

@villebro Thank you for implementation of this feature. Just sharing our usecase.
In our Superset, we keep the dashboard thumbnails cached everyday for all users for better user experience. We do plan to create a dashboard with user specific content, where we can use this feature of each user generating their own thumbnails.

But this implementation will force the thumbnail generation for all the dashboards for each user, which is not ideal for us.
An ideal solution for our usecase would be control of executor for each dashboard/chart, so the owner of dashboard/chart can decide if that particular dashboard/chart needs INITIATOR as executor for thumbnail. Whether it is actually feasible or not, I will let you decide.

@villebro villebro force-pushed the villebro/thumb-selenium branch 2 times, most recently from 47abf41 to 14de647 Compare December 6, 2022 09:10
@villebro
Copy link
Member Author

villebro commented Dec 7, 2022

Just sharing our usecase. In our Superset, we keep the dashboard thumbnails cached everyday for all users for better user experience. We do plan to create a dashboard with user specific content, where we can use this feature of each user generating their own thumbnails.

But this implementation will force the thumbnail generation for all the dashboards for each user, which is not ideal for us. An ideal solution for our usecase would be control of executor for each dashboard/chart, so the owner of dashboard/chart can decide if that particular dashboard/chart needs INITIATOR as executor for thumbnail. Whether it is actually feasible or not, I will let you decide.

@kamalkeshavani-aiinside we can certainly consider adding this in a future PR. All it would really require is adding a thumbnail_executor field in the Dashboard and Slice models and then add a dropdown in their respective modals (if undefined, it would default to the global config).

Would you be open to working on this feature? I'm happy to provide guidance and review help if needed.

from enum import Enum


class ExecutorType(str, Enum):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same type as the previous ReportScheduleExecutor, but with the added INITIATOR enum.

superset/config.py Show resolved Hide resolved
superset/models/slice.py Outdated Show resolved Hide resolved
superset/tasks/utils.py Outdated Show resolved Hide resolved
superset/dashboards/api.py Show resolved Hide resolved
superset/models/slice.py Show resolved Hide resolved
superset/thumbnails/digest.py Show resolved Hide resolved
docs/docs/installation/cache.mdx Outdated Show resolved Hide resolved
Comment on lines -130 to +156
dashboard = db.session.query(Dashboard).all()[0]
self.login(username="admin")
uri = f"api/v1/dashboard/{dashboard.id}/thumbnail/{dashboard.digest}/"
rv = self.client.get(uri)
_, thumbnail_url = self._get_id_and_thumbnail_url(DASHBOARD_URL)
rv = self.client.get(thumbnail_url)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the digest may now be affected by who is logged in, all tests are updated to fetch the thumbnail URL via the API after login.

superset/config.py Outdated Show resolved Hide resolved
@kamalkeshavani-aiinside
Copy link
Contributor

kamalkeshavani-aiinside commented Dec 9, 2022

@kamalkeshavani-aiinside we can certainly consider adding this in a future PR. All it would really require is adding a thumbnail_executor field in the Dashboard and Slice models and then add a dropdown in their respective modals (if undefined, it would default to the global config).

Would you be open to working on this feature? I'm happy to provide guidance and review help if needed.

@villebro Thank you for the suggestion. Sure, I can try to work on this with your help.

)

unique_string = _adjust_string_for_executor(unique_string, executor_type, executor)
return md5_sha_from_str(unique_string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

our current dashboard digest is:

    @property
    def digest(self) -> str:
        """
        Returns a MD5 HEX digest that makes this dashboard unique
        """
        unique_string = f"{self.position_json}.{self.css}.{self.json_metadata}"
        return md5_sha_from_str(unique_string)

Adding an executor will invalidate all current computed dashboard thumbnails.

I would also prefer to bring these changes back to /superset/tasks/thumbnails and avoid introducing the deprecation and underlying breaking change on the task structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an executor will invalidate all current computed dashboard thumbnails.

Oof, that was an unintended mistake, the executor was only supposed to be added in the case of CURRENT_USER.

I would also prefer to bring these changes back to /superset/tasks/thumbnails and avoid introducing the deprecation and underlying breaking change on the task structure.

Makes sense - I'll revert the move

Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGMT.

@villebro If you don't mind, it would be a good idea to also get @dpgaspar's approval before merging it. Thank you for the improvement!

@villebro
Copy link
Member Author

villebro commented Dec 9, 2022

Code LGMT.

@villebro If you don't mind, it would be a good idea to also get @dpgaspar's approval before merging it. Thank you for the improvement!

Absolutely @michael-s-molina 👍 As I also found a bug today in the PR I'm going to let it simmer over the weekend as I feel there's room for improvement in the tests.

Copy link
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, since this change will invalidate thumbnails cache it would be nice to add a note on UPDATING.md

return func(dashboard, executor_type, executor)

unique_string = (
f"{dashboard.id}\n{dashboard.charts}\n{dashboard.position_json}\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will dashboard.charts generate a N+1 issue?

Copy link
Member Author

@villebro villebro Dec 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR shouldn't change current performance, as this property is already present on the current request payload.

@villebro villebro merged commit aa0cae9 into apache:master Dec 14, 2022
@villebro villebro deleted the villebro/thumb-selenium branch December 14, 2022 13:02
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XXL 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants