Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #199: mime-type was incorrectly parsed from content-type when cha… #200

Merged
merged 4 commits into from
Apr 26, 2018

Conversation

dportabella
Copy link
Contributor

fix #199: mime-type was incorrectly parsed from content-type when charset param exists


GitHub issue(s):

If you are responding to an issue, please mention their numbers below.
#199

What does this Pull Request do?

Calling WarcRecordUtils.getWarcResponseMimeType with a header such as Content-Type: text/html;charset=ISO-8859-1 should return text/html. However it was returning text/html;charset=ISO-8859-1. Also, the function keepValidPages was not working properly because of this.

This pull request fixes this issue.

How should this be tested?

A unit test was added: WarcLoaderTest.testContentTypeWithCharset

Additional Notes:

Any additional information that you think would be helpful when reviewing this PR.

Example:

  • Does this change require documentation to be updated?
    No

  • Does this change add any new dependencies?
    No

  • Could this change or impact execution of existing code?
    No

Interested parties

Tag (@ mention) interested parties.

Thanks in advance for your help with the Archives Unleashed Toolkit!

@ruebot
Copy link
Member

ruebot commented Apr 23, 2018

@dportabella thanks for the PR! Love seeing this 😃

Looks like you got hit by checkstyle: https://travis-ci.org/archivesunleashed/aut/jobs/370216592#L4855-L4856

@dportabella
Copy link
Contributor Author

dportabella commented Apr 23, 2018

Line are limited to 80 characters? 😖
Ok, I fix this.
How do I test that it passes checkstyle? (I executed mvn package and it did succeed)
Then I create a new commit with --amend?
or I delete this pull-request and I create a new one?

@ruebot
Copy link
Member

ruebot commented Apr 23, 2018

@dportabella oh, you can create a new commit and push, and it'll automatically update here. I'll squash things done on merge.

@codecov
Copy link

codecov bot commented Apr 23, 2018

Codecov Report

Merging #200 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #200   +/-   ##
=======================================
  Coverage   67.29%   67.29%           
=======================================
  Files          32       32           
  Lines         636      636           
  Branches      124      124           
=======================================
  Hits          428      428           
  Misses        167      167           
  Partials       41       41
Impacted Files Coverage Δ
...ava/io/archivesunleashed/data/WarcRecordUtils.java 61.53% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08fab5b...f4e03a6. Read the comment docs.

@ruebot ruebot requested a review from lintool April 24, 2018 02:25
@ruebot
Copy link
Member

ruebot commented Apr 24, 2018

@lintool can I get one more set of eyes on this before I hit merge?

@@ -96,8 +96,12 @@ public static WARCRecord fromBytes(final byte[] bytes) throws IOException {
*/
public static String getWarcResponseMimeType(final byte[] contents) {
// This is a somewhat janky way to get the MIME type of the response.
// Moreover the parser is not fully complaint to the specification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might mean "compliant"?
How about - "Moreover, this simple regular expression is not compliant with the specification."

Copy link
Member

@lintool lintool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo, otherwise lgtm.

@ianmilligan1
Copy link
Member

@dportabella Thanks for this pull request! I think if you incorporate or consider @lintool's comments above (you can just push the changes to the branch and the PR will be updated) we'll be able to merge this into AUT.

@dportabella
Copy link
Contributor Author

done.

@ruebot ruebot merged commit b90c559 into archivesunleashed:master Apr 26, 2018
@greebie
Copy link
Contributor

greebie commented Apr 26, 2018

Thanks for being a contributor, @dportabella !

@dportabella
Copy link
Contributor Author

welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

keepValidPages incorrectly filters out pages with mime-type text/html followed by charset
5 participants