Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML signature detection does not work #681

Closed
pwinckles opened this issue Jun 30, 2021 · 0 comments · Fixed by #683
Closed

XML signature detection does not work #681

pwinckles opened this issue Jun 30, 2021 · 0 comments · Fixed by #683

Comments

@pwinckles
Copy link
Contributor

pwinckles commented Jun 30, 2021

JHOVE will not identify an XML document based on having its XML declaration.

For example, suppose you have the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<outer>
  <inner>
    blah
  </outer>
</inner>

And execute JHOVE on it:

❯ ./jhove -s /var/tmp/invalid.xml 
Jhove (Rel. 1.24.1, 2020-03-16)
 Date: 2021-06-30 15:02:25 CDT
 RepresentationInformation: /var/tmp/invalid.xml
  ReportingModule: ASCII-hul, Rel. 1.4.1 (2019-04-17)
  LastModified: 2021-06-29 13:21:30 CDT
  Size: 86
  Format: ASCII
  Status: Well-Formed
  SignatureMatches:
   ASCII-hul
  MIMEtype: text/plain; charset=US-ASCII

(Yes, the XML is not well-formed. That's the point.)

I dug into why this is happening, and the root cause is because a DataInputStream is being used to read the XML file here. DataInputStream cannot be used like this, and a InputStreamReader must be used instead.

Here's a demonstration of the problem:

String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
        "<outer>\n" +
        "  <inner>\n" +
        "    blah\n" +
        "  </outer>\n" +
        "</inner>\n" +
        "\n";

DataInputStream dataStream = new DataInputStream(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
System.out.print("DataInputStream: ");
while (true) {
    try {
        System.out.print(dataStream.readChar());
    } catch (IOException e) {
        break;
    }
}
System.out.print("\n\n");

InputStreamReader reader = new InputStreamReader(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
char c;
System.out.print("InputStreamReader: ");
while ((c = (char) reader.read()) != -1) {
    System.out.print(c);
}

That code produces the following output when run:

DataInputStream: 㰿硭氠癥牳楯渽∱⸰∠敮捯摩湧㴢啔䘭㠢㼾਼潵瑥爾ਠ‼楮湥爾ਠ†⁢污栊†㰯潵瑥爾਼⽩湮敲㸊

InputStreamReader: <?xml version="1.0" encoding="UTF-8"?>
<outer>
  <inner>
    blah
  </outer>
</inner>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant