Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Check existing Content-Type is in <head> #961

Merged
merged 4 commits into from
Apr 5, 2021

Conversation

JakeQZ
Copy link
Contributor

@JakeQZ JakeQZ commented Mar 23, 2021

If a valid Content-Type <meta> element is present, DOM conversion will
create a <head> element for it even without an explicit <head> tag in the
HTML.

However, to be valid, it must not be in the <body> element. As well as with
an explicit <body> start tag, the <body> element also begins whenever a
start tag for an element which cannot be in the <head> is encountered. This
is now checked.

Fixes #923.

@JakeQZ JakeQZ added the bug label Mar 23, 2021
@JakeQZ JakeQZ added this to the 5.0.1 milestone Mar 23, 2021
@JakeQZ JakeQZ requested a review from oliverklee March 23, 2021 18:25
@JakeQZ JakeQZ self-assigned this Mar 23, 2021
@JakeQZ
Copy link
Contributor Author

JakeQZ commented Mar 23, 2021

Theres a Psalm error I need to fix...

@JakeQZ JakeQZ marked this pull request as draft March 23, 2021 18:34
@JakeQZ
Copy link
Contributor Author

JakeQZ commented Mar 23, 2021

Now fixed the Psalm errors. (I've not been running it locally because I get about 50 more errors that we don't get on the build server - not sure why, maybe different OS or PHP version - am using the same PHAR.)

@JakeQZ JakeQZ marked this pull request as ready for review March 23, 2021 20:15
@JakeQZ
Copy link
Contributor Author

JakeQZ commented Mar 23, 2021

You'll probably need to edit the commit message and title before merging, as it includes your (@oliverklee) initial WIP commit with tests.

@JakeQZ
Copy link
Contributor Author

JakeQZ commented Mar 23, 2021

Last force-push was just fixing a typo in a comment, and adding a trigger_error to capture the unlikely event of preg_replace failing internally.

@JakeQZ
Copy link
Contributor Author

JakeQZ commented Mar 24, 2021

Now just moved exception handling to higher level for clarity.

psalm.baseline.xml Outdated Show resolved Hide resolved
src/HtmlProcessor/AbstractHtmlProcessor.php Outdated Show resolved Hide resolved
src/HtmlProcessor/AbstractHtmlProcessor.php Outdated Show resolved Hide resolved
src/HtmlProcessor/AbstractHtmlProcessor.php Outdated Show resolved Hide resolved
src/HtmlProcessor/AbstractHtmlProcessor.php Outdated Show resolved Hide resolved
src/HtmlProcessor/AbstractHtmlProcessor.php Outdated Show resolved Hide resolved
src/HtmlProcessor/AbstractHtmlProcessor.php Outdated Show resolved Hide resolved
src/HtmlProcessor/AbstractHtmlProcessor.php Outdated Show resolved Hide resolved
oliverklee and others added 2 commits April 5, 2021 17:54
If a valid `Content-Type` `<meta>` element is present, DOM conversion will
create a `<head>` element for it even without an explicit `<head>` tag in the
HTML.

However, to be valid, it must not be in the `<body>` element.  As well as with
an explicit `<body>` start tag, the `<body>` element also begins whenever a
start tag for an element which cannot be in the `<head>` is encountered.  This
is now checked.

Fixes #923.
src/HtmlProcessor/AbstractHtmlProcessor.php Outdated Show resolved Hide resolved
src/HtmlProcessor/AbstractHtmlProcessor.php Outdated Show resolved Hide resolved
@oliverklee oliverklee merged commit 8d737bd into main Apr 5, 2021
@oliverklee oliverklee deleted the bugfix/missing-head branch April 5, 2021 18:14
JakeQZ added a commit that referenced this pull request Apr 8, 2021
During review for #961, a regex change to capture the HTML before the
`Content-Type` tag drew attention to a couple of edge cases that weren't
explicitly tested for and would fail if there was a mistake in the regex
pattern:
- A newline before the `Content-Type` tag (would fail without the
  `PCRE_DOTALL`/`s` modifier - while caught by another test,
  `getDomDocumentWithNormalizedHtmlRepresentsTheGivenHtml`, that test was not
  specifically intended for this purpose);
- A `Content-Type` tag in both the `<head>` and `<body>` (would fail without the
  lazy/`?` quantifier after `.*`).

Tests have now been added to cover these.
JakeQZ added a commit that referenced this pull request Apr 8, 2021
During review for #961, a regex change to capture the HTML before the
`Content-Type` tag drew attention to a couple of edge cases that weren't
explicitly tested for and would fail if there was a mistake in the regex
pattern:
- A newline before the `Content-Type` tag (would fail without the
  `PCRE_DOTALL`/`s` modifier - while caught by another test,
  `getDomDocumentWithNormalizedHtmlRepresentsTheGivenHtml`, that test was not
  specifically intended for this purpose);
- A `Content-Type` tag in both the `<head>` and `<body>` (would fail without the
  lazy/`?` quantifier after `.*`).

Tests have now been added to cover these.
oliverklee pushed a commit that referenced this pull request Apr 8, 2021
During review for #961, a regex change to capture the HTML before the
`Content-Type` tag drew attention to a couple of edge cases that weren't
explicitly tested for and would fail if there was a mistake in the regex
pattern:
- A newline before the `Content-Type` tag (would fail without the
  `PCRE_DOTALL`/`s` modifier - while caught by another test,
  `getDomDocumentWithNormalizedHtmlRepresentsTheGivenHtml`, that test was not
  specifically intended for this purpose);
- A `Content-Type` tag in both the `<head>` and `<body>` (would fail without the
  lazy/`?` quantifier after `.*`).

Tests have now been added to cover these.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CssInliner getHeadElement Error (missing head)
2 participants