Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for HTML fragments #845

Merged
merged 14 commits into from
Dec 19, 2024

Conversation

JuroOravec
Copy link
Collaborator

@JuroOravec JuroOravec commented Dec 12, 2024

Adds support for rendering HTML fragments by setting type="fragent", e.g.

MyTable.render(type="fragment")
  • Plus docs for using HTML fragments with HTMX, AlpineJS, vanilla JS
  • And examples in the sampleproject for HTML fragments with HTMX, AlpineJS, vanilla JS

@@ -1,6 +1,6 @@
---
title: Authoring component libraries
weight: 7
weight: 8
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserted the articles on HTML fragments, so had to adjust the indices

# corresponding assets
# See https://developer.mozilla.org/en-US/docs/Web/HTML/Element/script#embedding_data_in_html
exec_script = json.dumps(exec_script_data)
exec_script = f'<script type="application/json" data-djc>{exec_script}</script>'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when we rendered a component, we inserted a <script> tag that contained logic to load the JS / CSS via the client-side dependency manager.

But there was a problem that browsers execute the <script> tags only when certain methods are used. And e.g. the scripts were not being triggered when for example when the script was inserted via el.innerHTML.

So I changed the approach. Instead of inserting a <script> tag with an executable JS, now we insert only a JSON data as <script type="application/json">.

And instead, our dependency manager JS now listens for changes to the DOM. And when the changes include inserting <script type="application/json"> with the data-djc attribute, we know that this is coming from us. At that point we parse the JSON, and run the same logic that we originally wanted to run in the script.

This is kinda similar to AlpineJS, where they also listen for DOM changes to know if any of the Alpine attributes (e.g. x-data) were added or removed.

observeScriptTag((script) => {
const data = JSON.parse(script.text);
_loadComponentScripts(data);
});
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we define the MutationObserver that watches the DOM for changes

@JuroOravec JuroOravec force-pushed the jo-feat-html-fragments branch from 04e418b to f577219 Compare December 12, 2024 18:16
@JuroOravec JuroOravec force-pushed the jo-feat-html-fragments branch 2 times, most recently from c4120db to c2c578b Compare December 12, 2024 19:05
@JuroOravec JuroOravec force-pushed the jo-feat-html-fragments branch from c2c578b to c8ea7a8 Compare December 12, 2024 19:09
@JuroOravec JuroOravec marked this pull request as ready for review December 12, 2024 19:16
@JuroOravec
Copy link
Collaborator Author

JuroOravec commented Dec 12, 2024

@EmilStenstrom Sorry for the spam with the PRs, I'm sure you have a lot going on. When you have time, could you go over this PR before the others? This feature I want to use at work.

Copy link
Owner

@EmilStenstrom EmilStenstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! This is a great feature, I (finally) understand the lure of fragments (it's all about how dependencies are handled) and am really happy that django-components are supporting them fully finally. Only thing I'm unsure about are the security aspects of this, the escaping/unescaping, are we sure they are bulletproof?

sampleproject/components/fragment.py Outdated Show resolved Hide resolved
sampleproject/components/fragment.py Outdated Show resolved Hide resolved
@@ -79,8 +79,10 @@ def get_last_index(lst: List, key: Callable[[Any], bool]) -> Optional[int]:
return None


def _escape_js(js: str, eval: bool = True) -> str:
escaped_js = escape_js_string_literal(js)
def _escape_js(content: str, wrap: bool = True, eval: bool = True) -> str:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there lots of tests testing the security of this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests for that yet.


Only thing I'm unsure about are the security aspects of this, the escaping/unescaping, are we sure they are bulletproof?

Good catch with the escaping. It's meant to escape the sequence </script>, which would break the script otherwise. Because user could define JS scripts like so:

console.log("I'm am a <script></script> tag!");

Previously my approach was to wrap the JS script body in ` (backticks), and execute the script with eval(unescape(body)).

But I also found recently that the eval() function may be blocked in stricter environments, so ideally we'd avoid relying on it.

There's two places where there may be content inside <script> tag that could potentially contain </script> keyword:

  1. When we insert the JS from Component.js

    • Checking with ChatGPT, I don't think there's a fully 100% reliable way to escape the sequence </script> from a JS script without breaking something.

      • The problem here is that we can't know if the </script> was used as part of a string or inside a comment or elsewhere. And to know that we'd have to parse the script as AST, and that's way too much effort. Because if we had these 3 lines:

        • 'I am </script>'
        • "I am </script>"
        • // I am </script>

        Then the first two would need different ways of escaping them:

        • 'I am <' + '/script>'
        • "I am <" + "/script>"
        • // I am </script>
    • So instead, I suggest that we simply check if the body in Component.js contains literal </script>. And if it does, we raise an error. That way this issue is captured before it gets to the browser.

  2. When we insert the list of dependencies (JS / CSS) for the client-side dependency manager

    • The good thing here is that the dependencies are defined statically as JSON. But the json may still contain a literal </script>
    • But because the JSON is static and we don't need eval() to run it, I'm thinking of converting the entries inside the JSON to byte64, so any potential </script> are escaped.

So I'll fix update the escaping as outlined above and then I'll add some tests for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this in f8a7424. And added checks here.

tests/test_dependency_rendering.py Outdated Show resolved Hide resolved
@JuroOravec JuroOravec force-pushed the jo-feat-html-fragments branch from e493bbd to ebb4236 Compare December 16, 2024 17:05
@@ -650,9 +663,20 @@ def _get_script_tag(
script = get_script_content(script_type, comp_cls)

if script_type == "js":
return f"<script>{_escape_js(script)}</script>"
if "</script" in script:
Copy link
Owner

@EmilStenstrom EmilStenstrom Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still too brittle. A starting <script> tag would break this too (they can't nest so the parser implicitly closes the previous one). I think you can put non-printable characters in </script to still have it execute in some browsers. Other browser tags may close this as well, like </html>.

Copy link
Collaborator Author

@JuroOravec JuroOravec Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, could you point to some examples/reference? My own testing in Chrome and Safari, as well as ChatGPT suggest that:

  1. Inserting <script> inside <script> does NOT split it original script
  2. Any other end tag other than </script> is ignored inside <script>

I think you can put non-printable characters in </script

What do you mean? Do you mean that:

  1. People insert non-printable chars inside </script in order to still run the original <script>
  2. Or that they insert the non-printable chars so that </script will be picked up in only some browsers?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing in codepen in Firefox (double-verified in Edge):

  • <script>document.write("<script>HELLO")</script> renders nothing, because the <script> tags ends the first script after the first quote
  • </script> containing control chars doesn't seem to work.

Going back to the original problem here: Are we trying to prevent people from making accidental mistakes, or is this a security feature?

Copy link
Collaborator Author

@JuroOravec JuroOravec Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back to the original problem here: Are we trying to prevent people from making accidental mistakes, or is this a security feature?

Preventing people from accidental mistakes. In one of the future PRs, I want to add component attributess similar to template_name, which would specify Component.js/css as a separate file. But although written in a separate file, they would eventually be inserted into the HTML as <script> and <style> tags.

And I imagine it could be fairly hard-to-find bug, if I accidentally broke the JS script because I included </script> within a string or a comment.


Hm, I still can't reproduce the errors.

Yes, your example with

<script>document.write("<script>HELLO")</script>

doesn't show anything in the DOM, but that's not because of what we're discussing here. It's because you're introducing an un-terminated <script> tag into the DOM. And what the browser does is that it looks for the closest </script>. Which, if there is none, means that all the rest of the page is interpreted as contents of the <script> tag.

See the highlighted section in the screenshot below. The section contains </body>, </html>, and a comment, which were part of the original document, but are now treated as contents of the inserted <script>. The browser then reaches the end of the document without terminating the <script>, nor <body> and <html>, so it adds the terminating </script>, </body> and </html> tags, which you can see below the highlighted section.

Screenshot 2024-12-19 at 08 16 41

On the other hand, I got consistently the same results on:

  • Chrome v118 (MacOS 14.5)
  • Chrome v131 (Windows 11)
  • Firefox v109 (Mac)
  • Firefox v133 (Win)
  • Safari v17.5 (Mac)
  • Edge v131 (Win)

The consistent behaviour is that:

  1. When I insert something into the </script> tag (e.g. </\u200Bscript>), the browser does NOT treat is as proper end tag, and instead only as a string
  2. If I insert extra <script> inside another script tag, nothing happens, browser treats it as a text
  3. If I insert extra </script> inside another script tag, the script ends prematurely

Of the 3, behavior 1 and 2 is expected and desirable. It's only behaviour 3 that I want to guard against. And it seems that checking only for the presence of </script> is sufficient.

467179818_1124658752626076_646614759430361040_n

467036459_1534342680580381_1661674013320858212_n

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @JuroOravec, you have me convinced! OK to merge from me! :)

@JuroOravec JuroOravec merged commit 4dab940 into EmilStenstrom:master Dec 19, 2024
15 checks passed
@JuroOravec JuroOravec deleted the jo-feat-html-fragments branch December 19, 2024 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants