-
-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for HTML fragments #845
feat: Add support for HTML fragments #845
Conversation
@@ -1,6 +1,6 @@ | |||
--- | |||
title: Authoring component libraries | |||
weight: 7 | |||
weight: 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inserted the articles on HTML fragments, so had to adjust the indices
# corresponding assets | ||
# See https://developer.mozilla.org/en-US/docs/Web/HTML/Element/script#embedding_data_in_html | ||
exec_script = json.dumps(exec_script_data) | ||
exec_script = f'<script type="application/json" data-djc>{exec_script}</script>' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, when we rendered a component, we inserted a <script>
tag that contained logic to load the JS / CSS via the client-side dependency manager.
But there was a problem that browsers execute the <script>
tags only when certain methods are used. And e.g. the scripts were not being triggered when for example when the script was inserted via el.innerHTML
.
So I changed the approach. Instead of inserting a <script>
tag with an executable JS, now we insert only a JSON data as <script type="application/json">
.
And instead, our dependency manager JS now listens for changes to the DOM. And when the changes include inserting <script type="application/json">
with the data-djc
attribute, we know that this is coming from us. At that point we parse the JSON, and run the same logic that we originally wanted to run in the script.
This is kinda similar to AlpineJS, where they also listen for DOM changes to know if any of the Alpine attributes (e.g. x-data
) were added or removed.
observeScriptTag((script) => { | ||
const data = JSON.parse(script.text); | ||
_loadComponentScripts(data); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we define the MutationObserver that watches the DOM for changes
04e418b
to
f577219
Compare
for more information, see https://pre-commit.ci
c4120db
to
c2c578b
Compare
c2c578b
to
c8ea7a8
Compare
@EmilStenstrom Sorry for the spam with the PRs, I'm sure you have a lot going on. When you have time, could you go over this PR before the others? This feature I want to use at work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! This is a great feature, I (finally) understand the lure of fragments (it's all about how dependencies are handled) and am really happy that django-components are supporting them fully finally. Only thing I'm unsure about are the security aspects of this, the escaping/unescaping, are we sure they are bulletproof?
src/django_components/util/misc.py
Outdated
@@ -79,8 +79,10 @@ def get_last_index(lst: List, key: Callable[[Any], bool]) -> Optional[int]: | |||
return None | |||
|
|||
|
|||
def _escape_js(js: str, eval: bool = True) -> str: | |||
escaped_js = escape_js_string_literal(js) | |||
def _escape_js(content: str, wrap: bool = True, eval: bool = True) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there lots of tests testing the security of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No tests for that yet.
Only thing I'm unsure about are the security aspects of this, the escaping/unescaping, are we sure they are bulletproof?
Good catch with the escaping. It's meant to escape the sequence </script>
, which would break the script otherwise. Because user could define JS scripts like so:
console.log("I'm am a <script></script> tag!");
Previously my approach was to wrap the JS script body in ` (backticks), and execute the script with eval(unescape(body))
.
But I also found recently that the eval()
function may be blocked in stricter environments, so ideally we'd avoid relying on it.
There's two places where there may be content inside <script>
tag that could potentially contain </script>
keyword:
-
When we insert the JS from
Component.js
-
Checking with ChatGPT, I don't think there's a fully 100% reliable way to escape the sequence
</script>
from a JS script without breaking something.-
The problem here is that we can't know if the
</script>
was used as part of a string or inside a comment or elsewhere. And to know that we'd have to parse the script as AST, and that's way too much effort. Because if we had these 3 lines:'I am </script>'
"I am </script>"
// I am </script>
Then the first two would need different ways of escaping them:
'I am <' + '/script>'
"I am <" + "/script>"
// I am </script>
-
-
So instead, I suggest that we simply check if the body in
Component.js
contains literal</script>
. And if it does, we raise an error. That way this issue is captured before it gets to the browser.
-
-
When we insert the list of dependencies (JS / CSS) for the client-side dependency manager
- The good thing here is that the dependencies are defined statically as JSON. But the json may still contain a literal
</script>
- But because the JSON is static and we don't need
eval()
to run it, I'm thinking of converting the entries inside the JSON to byte64, so any potential</script>
are escaped.
- The good thing here is that the dependencies are defined statically as JSON. But the json may still contain a literal
So I'll fix update the escaping as outlined above and then I'll add some tests for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e493bbd
to
ebb4236
Compare
@@ -650,9 +663,20 @@ def _get_script_tag( | |||
script = get_script_content(script_type, comp_cls) | |||
|
|||
if script_type == "js": | |||
return f"<script>{_escape_js(script)}</script>" | |||
if "</script" in script: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still too brittle. A starting <script>
tag would break this too (they can't nest so the parser implicitly closes the previous one). I think you can put non-printable characters in </script
to still have it execute in some browsers. Other browser tags may close this as well, like </html>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, could you point to some examples/reference? My own testing in Chrome and Safari, as well as ChatGPT suggest that:
- Inserting
<script>
inside<script>
does NOT split it original script - Any other end tag other than
</script>
is ignored inside<script>
I think you can put non-printable characters in
</script
What do you mean? Do you mean that:
- People insert non-printable chars inside
</script
in order to still run the original<script>
- Or that they insert the non-printable chars so that
</script
will be picked up in only some browsers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm testing in codepen in Firefox (double-verified in Edge):
- <script>document.write("<script>HELLO")</script> renders nothing, because the <script> tags ends the first script after the first quote
- </script> containing control chars doesn't seem to work.
Going back to the original problem here: Are we trying to prevent people from making accidental mistakes, or is this a security feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back to the original problem here: Are we trying to prevent people from making accidental mistakes, or is this a security feature?
Preventing people from accidental mistakes. In one of the future PRs, I want to add component attributess similar to template_name
, which would specify Component.js/css
as a separate file. But although written in a separate file, they would eventually be inserted into the HTML as <script>
and <style>
tags.
And I imagine it could be fairly hard-to-find bug, if I accidentally broke the JS script because I included </script>
within a string or a comment.
Hm, I still can't reproduce the errors.
Yes, your example with
<script>document.write("<script>HELLO")</script>
doesn't show anything in the DOM, but that's not because of what we're discussing here. It's because you're introducing an un-terminated <script>
tag into the DOM. And what the browser does is that it looks for the closest </script>
. Which, if there is none, means that all the rest of the page is interpreted as contents of the <script>
tag.
See the highlighted section in the screenshot below. The section contains </body>
, </html>
, and a comment, which were part of the original document, but are now treated as contents of the inserted <script>
. The browser then reaches the end of the document without terminating the <script>
, nor <body>
and <html>
, so it adds the terminating </script>
, </body>
and </html>
tags, which you can see below the highlighted section.
On the other hand, I got consistently the same results on:
- Chrome v118 (MacOS 14.5)
- Chrome v131 (Windows 11)
- Firefox v109 (Mac)
- Firefox v133 (Win)
- Safari v17.5 (Mac)
- Edge v131 (Win)
The consistent behaviour is that:
- When I insert something into the
</script>
tag (e.g.</\u200Bscript>
), the browser does NOT treat is as proper end tag, and instead only as a string - If I insert extra
<script>
inside another script tag, nothing happens, browser treats it as a text - If I insert extra
</script>
inside another script tag, the script ends prematurely
Of the 3, behavior 1 and 2 is expected and desirable. It's only behaviour 3 that I want to guard against. And it seems that checking only for the presence of </script>
is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @JuroOravec, you have me convinced! OK to merge from me! :)
Adds support for rendering HTML fragments by setting
type="fragent"
, e.g.sampleproject
for HTML fragments with HTMX, AlpineJS, vanilla JS