-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to detect malformed byte sequences #200
Add option to detect malformed byte sequences #200
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the mb_check_encoding()
will detect any invalid byte sequence, so the language used should move from "malformed" (which is a specific issue) to "invalid" (which covers all detected issues).
I've provided change suggestions for the code, but the tests will also needed to be adapted to match the adapted code.
src/Dom/Document.php
Outdated
private function assertNoneMalformedByteSequences($source) | ||
{ | ||
if ($this->options[Option::ADDITIONAL_VALIDATION] && ! mb_check_encoding($source)) { | ||
throw new \InvalidArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use a custom exception InvalidByteSequence
with a named constructor (see src/Exception
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the exception message contain an HTML source?
If not, is there necessary to write a test for InvalidByteSequence that contains only a message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally yes, but for string-based HTML, it is actually hard to produce something useful for an exception message that doesn't flood the screen/logs... Not sure how we would go about that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The third way is to add a second argument to the named constructor and then provide a method like $invalidByteSequence->getHtmlSource()
to get the html contained invalid byte sequence.
Co-authored-by: Alain Schlesser <alain.schlesser@gmail.com>
I'm with you on that, "invalid" is a good choice for Thank you for your detailed review. I appreciate it. |
Thanks for the PR, @06romix ! |
This PR adds the option for detecting malformed byte sequences according to #24