Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow for line breaks when splitting xrefs for id and position #345

Merged
merged 13 commits into from
Oct 6, 2020

Conversation

Connum
Copy link
Contributor

@Connum Connum commented Sep 26, 2020

When looking at #19 and testing with the last PDF file provided there, I found out that the issue came from the raw data having a line break between the xref id and position instead of a space. That resulted in an empty string instead of a number for the position, resulting in an invalid argument for the substr() function call.

This fix splits for any regex \s character(s) instead of just a space, which resolves the issue.

Please note that the CI check will expectedly fail for test case testGetDataTmIssue336 while PR #343 is pending.

edit: fixes #19

Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

  • Can you provide a simple test that outlines the problem and demonstrates the fix? That would be great.

@k00ni k00ni self-assigned this Sep 28, 2020
@k00ni k00ni added the fix label Sep 28, 2020
@k00ni k00ni linked an issue Sep 28, 2020 that may be closed by this pull request
@Connum
Copy link
Contributor Author

Connum commented Sep 28, 2020

I tried before making the PR, but it's not that trivial, unfortunately... In order to unit test parseObject(), we need to provide $structure and $document as well, so we'd need to add a test file and do the complete parsing process on it.
I tried manipulating one of the existing sample files into having a line break in the xref position, but it seems those are not in the raw data that is human-readable, but somehow encoded... I don't know how else we could test this for now, other than parsing the provided file (#19 (comment)) manually before and after the fix (which I did of course).

@k00ni
Copy link
Collaborator

k00ni commented Sep 29, 2020

You could create a sub class in the test file, which calls the protected method for you, like:

// Parser.php
class Parser {
  protected function parseObject() {
    // secret stuff, you want to test
  }
}

// in the test file
class ParserSub extends Parser {
  public function exposedParseObject(...) {
    return $this->parseObject(...);
  }
}

class ParserTest ... {
   public function testParseObject()
   {
       $fixture = new ParserSub();
       $fixture->exposedParseObject(...);
   }
}

Source: https://stackoverflow.com/a/249776/5301527

This way you could inject a prepared Document instance, so the new code is being used. What do you think?

@Connum
Copy link
Contributor Author

Connum commented Sep 29, 2020

This way you could inject a prepared Document instance, so the new code is being used. What do you think?

Amazing, thanks for the input! That way, I managed to create a test case that reproduces the error message before the fix, and passes after the fix. It was a bit tricky, and I also extended the TestCase base class to allow catching errors that are normally not catchable via a try/catch statement, like E_WARNING or E_NOTICE (the latter of which we had in this case). This might come in handy for future tests as well!

tests/TestCase.php Outdated Show resolved Hide resolved
Comment on lines 99 to 115
$errorConstants = [
1 => 'E_ERROR',
2 => 'E_WARNING',
4 => 'E_PARSE',
8 => 'E_NOTICE',
16 => 'E_CORE_ERROR',
32 => 'E_CORE_WARNING',
64 => 'E_COMPILE_ERROR',
128 => 'E_COMPILE_WARNING',
256 => 'E_USER_ERROR',
512 => 'E_USER_WARNING',
1024 => 'E_USER_NOTICE',
2048 => 'E_STRICT',
4096 => 'E_RECOVERABLE_ERROR',
8192 => 'E_DEPRECATED',
16384 => 'E_USER_DEPRECATED',
32767 => 'E_ALL'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a little bit out of scope. Is it really necessary to create a list of PHP error types? Why not just print the error message and available meta data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it might be a bit over the top... I thought it could be helpful to see the type of error before the error message in a human-readable way when writing the tests. But if you think it's too much, I can remove it! Please let me know again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove it here. But it could be worth looking into it again in another PR.

Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for effort to add a test!

I made some in-code comments.

tests/TestCase.php Outdated Show resolved Hide resolved
tests/TestCase.php Outdated Show resolved Hide resolved
@Connum
Copy link
Contributor Author

Connum commented Sep 29, 2020

I think I found an easier test mechanism now, simply checking for the parsed object in the test data. I also merged the master branch so we can finally see the CI build passing.

@Connum Connum requested a review from k00ni September 29, 2020 16:39
Copy link
Collaborator

@k00ni k00ni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! It looks so much cleaner now.

Is your work done here? If so, I will keep this open for a few days to allow our community to comment.

@Connum
Copy link
Contributor Author

Connum commented Sep 30, 2020

Yes, all done, thanks for your support!

@k00ni k00ni merged commit fdbbb5c into smalot:master Oct 6, 2020
@Connum Connum deleted the fix-19 branch October 6, 2020 09:08
partulaj pushed a commit to partulaj/pdfparser that referenced this pull request Dec 21, 2020
…t#345)

* allow for line breaks when splitting xrefs for id and position

* extend TestCase.php with functionality to "catch" E_NOTICE and E_WARNING

* added test case for this fix

* only reset error handler when the current handler is the handler we had set before

* work around for failing CI build with PHP 5.6

* added comment and link to the workaround getting the current error handler

* removed unnecessary ini_set call

* remove error level constant name before error message

* restore error from the error handler itself, to prevent PHPUnit's "THE ERROR HANDLER HAS CHANGED!" message

* reverse the changes made to the TestCase class and the code in the test case depending on it

* simplified test case, now checking if object has been parsed correctly

* code linting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Undefined Offset in Parser.php causing FollowUp Fatal Error
2 participants