Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML should not be validated when no schema provided #680

Closed
pwinckles opened this issue Jun 24, 2021 · 2 comments · Fixed by #685
Closed

XML should not be validated when no schema provided #680

pwinckles opened this issue Jun 24, 2021 · 2 comments · Fixed by #685

Comments

@pwinckles
Copy link
Contributor

The documentation states:

Note that the concept of validity applies only to XML files that explicitly reference a DTD or XML Schema. JHOVE can determine if either of these conditions are met and if so, it will invoke automatically the SAX2 parser in a validating mode. Otherwise, the parser is invoked in a manner that only checks for well-formedness.

However, this is not what the code actually does. Instead, it will validate any XML document that contains a namespace regardless of whether or not there is a schema or dtd.

Here is a example that demonstrates the problem:

<example xmlns="http://example.com">
  <text>This is an example</text>
</example>
❯ ./jhove -m XML-hul example.xml 
Jhove (Rel. 1.24.1, 2020-03-16)
 Date: 2021-06-24 07:41:51 CDT
 RepresentationInformation: example.xml
  ReportingModule: XML-hul, Rel. 1.5.1 (2019-04-17)
  LastModified: 2021-06-24 07:41:36 CDT
  Size: 82
  Format: XML
  Version: 1.0
  Status: Well-Formed, but not valid
  SignatureMatches:
   XML-hul
  ErrorMessage: SaxParseException: cvc-elt.1: Cannot find the declaration of element 'example'.: Line = 1, Column = 37
   ID: XML-HUL-1
  MIMEtype: text/xml
  XMLMetadata: 
   Parser: org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser
   Encoding: UTF-8
   Schemas: 
    Schema: 
     NamespaceURI: http://example.com
     SchemaLocation: 
   Root: example
   Namespaces: 
    Namespace: 
     Prefix: 
     URI: http://example.com

The expected behavior is that JHOVE only checks for well-formedness and does not validate the xml.

@ross-spencer
Copy link

Hi Peter - are you anticipating that the result should just read "well formed"? Or is there something else, e.g. "well-formed, validity not checked", i.e. not to imply validity was checked?

@pwinckles
Copy link
Contributor Author

As far as I can tell, these are the possible outcomes:

  1. Well-Formed
  2. Well-Formed and valid
  3. Well-Formed, but not valid
  4. Not well-formed

I believe that covers all possible outcomes, and I do not think a new status needs to be introduced. Well-Formed implies Well-Formed, validity not checked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants