Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improves SafeMarkdown HTML sanitization #21895

Merged

Conversation

michael-s-molina
Copy link
Member

@michael-s-molina michael-s-molina commented Oct 20, 2022

SUMMARY

Currently, we are rendering HTML by default and using the following function to sanitize the content:

function isSafeMarkup(node: MarkdownAbstractSyntaxTree) {
  return node.type === 'html' && node.value
    ? !/(href|src)="(javascript|vbscript|file):.*"/gim.test(node.value)
    : true;
}

This only blocks a small subset of potential attack vectors as you can easily see here. To improve the component's security this PR does the following:

  • Upgrades react-markdown from version 4.3.1 to 8.0.3. This will help with many security issues and also enable the use of some really cool plugins to extend our markdown capabilities.
  • Installs rehype-raw to parse HTML content.
  • Installs rehype-sanitize to sanitize the HTML content. By default, it uses the same schema used by GitHub to sanitize HTML, which means a more secure configuration.
  • Adds the HTML_SANITIZATION configuration option to enable/disable HTML sanitization.
  • Adds the HTML_SANITIZATION_SCHEMA_EXTENSIONS configuration option to extend the default schema in case someone has a controlled environment and wishes to use some blocked elements/attributes.

TESTING INSTRUCTIONS

  • Make sure HTML is sanitized or not depending on the value of HTML_SANITIZATION
  • Make sure HTML sanitization reflects the extensions made in HTML_SANITIZATION_SCHEMA_EXTENSIONS
  • Make sure the examples that use markdown are correctly rendered

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@michael-s-molina michael-s-molina marked this pull request as ready for review October 24, 2022 22:18
@codecov
Copy link

codecov bot commented Oct 24, 2022

Codecov Report

Merging #21895 (3af45a1) into master (4cbd70d) will decrease coverage by 0.92%.
The diff coverage is 87.50%.

❗ Current head 3af45a1 differs from pull request most recent head 582f34e. Consider uploading reports for the commit 582f34e to get more accurate results

@@            Coverage Diff             @@
##           master   #21895      +/-   ##
==========================================
- Coverage   67.03%   66.11%   -0.93%     
==========================================
  Files        1813     1813              
  Lines       69424    69426       +2     
  Branches     7448     7448              
==========================================
- Hits        46541    45898     -643     
- Misses      20963    21608     +645     
  Partials     1920     1920              
Flag Coverage Δ
hive ?
mysql ?
postgres ?
presto ?
python 79.61% <100.00%> (-1.93%) ⬇️
sqlite 76.90% <100.00%> (+<0.01%) ⬆️
unit 51.18% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/views/base.py 75.70% <ø> (ø)
...s/superset-ui-core/src/components/SafeMarkdown.tsx 66.66% <66.66%> (ø)
...c/dashboard/components/gridComponents/Markdown.jsx 72.36% <100.00%> (ø)
...rset-frontend/src/explore/components/SaveModal.tsx 40.00% <100.00%> (ø)
superset/config.py 91.82% <100.00%> (+0.05%) ⬆️
superset/db_engines/hive.py 0.00% <0.00%> (-85.19%) ⬇️
superset/datasets/commands/create.py 30.61% <0.00%> (-69.39%) ⬇️
superset/datasets/commands/update.py 25.00% <0.00%> (-69.05%) ⬇️
superset/datasets/commands/bulk_delete.py 33.33% <0.00%> (-53.34%) ⬇️
superset/datasets/columns/commands/delete.py 44.11% <0.00%> (-52.95%) ⬇️
... and 40 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@geido geido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and tested the schema overrides to work as expected. Thanks for doing this, this is a really important enhancement.

It would be nice to follow-up with a PR that adds test coverage (I'm surprised the CI coverage test for superset-ui/core didn't complain about this; @zhaoyongjie do you know?)

Comment on lines +32 to +36
function SafeMarkdown({
source,
htmlSanitization = true,
htmlSchemaOverrides = {},
}: SafeMarkdownProps) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have RTL tests that test something along the following lines:

  • the default use case where source is rendering correctly
  • sanitization is disabled -> how do things change
  • a schema override before and after

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work on this in a follow-up 👍🏼

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro superset-ui/core only guarantee 100% coverage on the js/ts file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro superset-ui/core only guarantee 100% coverage on the js/ts file.

Oh right! Thanks for following up @zhaoyongjie ! 👍

Comment on lines +37 to +50
const displayHtml = isFeatureEnabled(FeatureFlag.DISPLAY_MARKDOWN_HTML);
const escapeHtml = isFeatureEnabled(FeatureFlag.ESCAPE_MARKDOWN_HTML);

const rehypePlugins = useMemo(() => {
const rehypePlugins: any = [];
if (displayHtml && !escapeHtml) {
rehypePlugins.push(rehypeRaw);
if (htmlSanitization) {
const schema = merge(defaultSchema, htmlSchemaOverrides);
rehypePlugins.push([rehypeSanitize, schema]);
}
}
return rehypePlugins;
}, [displayHtml, escapeHtml, htmlSanitization, htmlSchemaOverrides]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of unrelated comment, but just thinking out loud: as the feature flags here are constant (they're populated from bootstrap data at load time), displayHtml and escapeHtml are both constants, hence won't really change and are kind of redundant in the dep array. But it would be cool to make something like

const displayHtml = useFeatureFlag(FeatureFlag.DISPLAY_MARKDOWN_HTML);

which could change dynamically, in which case they would also be more relevant in the dep array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I was thinking the same thing here 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @villebro @michael-s-molina, I thought the dynamic feature flag might not be a good approach. The reason is that

  1. There are some FF not only on the frontend but also on the backend. for example, ENABLE_TEMPLATE_PROCESSING. so it's hard to set a FF by dynamic.
  2. Some FF is designed for security, for example, ENABLE_EXPLORE_JSON_CSRF_PROTECTION,
    ENABLE_TEMPLATE_PROCESSING, and ENABLE_TEMPLATE_REMOVE_FILTERS. The system administrator does not allow users to modify these config dynamically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of these problems can be addressed in a future configuration module. We can choose what properties we expose to the module and also add the necessary logic to change backend flags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think we should create a new concept to handle user settings.

john-bodley added a commit to airbnb/superset-fork that referenced this pull request Dec 8, 2022
john-bodley added a commit to airbnb/superset-fork that referenced this pull request Jan 17, 2023
justinpark added a commit to airbnb/superset-fork that referenced this pull request Mar 3, 2023
@mistercrunch mistercrunch added 🍒 1.5.3 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L v1.5 v2.0 v2.0.1 🍒 1.5.3 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants