Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to detect malformed byte sequences #200

Merged

Conversation

06romix
Copy link
Contributor

@06romix 06romix commented May 15, 2021

This PR adds the option for detecting malformed byte sequences according to #24

Copy link
Collaborator

@schlessera schlessera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mb_check_encoding() will detect any invalid byte sequence, so the language used should move from "malformed" (which is a specific issue) to "invalid" (which covers all detected issues).

I've provided change suggestions for the code, but the tests will also needed to be adapted to match the adapted code.

src/Dom/Document.php Outdated Show resolved Hide resolved
src/Dom/Document.php Outdated Show resolved Hide resolved
src/Dom/Document.php Outdated Show resolved Hide resolved
src/Dom/Document/Option.php Outdated Show resolved Hide resolved
src/Dom/Document/Option.php Outdated Show resolved Hide resolved
src/Dom/Document.php Outdated Show resolved Hide resolved
src/Dom/Document.php Outdated Show resolved Hide resolved
private function assertNoneMalformedByteSequences($source)
{
if ($this->options[Option::ADDITIONAL_VALIDATION] && ! mb_check_encoding($source)) {
throw new \InvalidArgumentException(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use a custom exception InvalidByteSequence with a named constructor (see src/Exception).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the exception message contain an HTML source?
If not, is there necessary to write a test for InvalidByteSequence that contains only a message?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally yes, but for string-based HTML, it is actually hard to produce something useful for an exception message that doesn't flood the screen/logs... Not sure how we would go about that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third way is to add a second argument to the named constructor and then provide a method like $invalidByteSequence->getHtmlSource() to get the html contained invalid byte sequence.

@06romix
Copy link
Contributor Author

06romix commented May 22, 2021

I'm with you on that, "invalid" is a good choice for mb_check_encoding().

Thank you for your detailed review. I appreciate it.

tests/Dom/DocumentTest.php Show resolved Hide resolved
tests/Dom/DocumentTest.php Show resolved Hide resolved
@schlessera
Copy link
Collaborator

Thanks for the PR, @06romix !

@schlessera schlessera merged commit e897e77 into ampproject:main May 26, 2021
@schlessera schlessera added the DOM label May 26, 2021
@schlessera schlessera added this to the 0.6.0 milestone May 26, 2021
@06romix 06romix deleted the add/option_check_malformed_byte_sequence branch May 26, 2021 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants