You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that the concept of validity applies only to XML files that explicitly reference a DTD or XML Schema. JHOVE can determine if either of these conditions are met and if so, it will invoke automatically the SAX2 parser in a validating mode. Otherwise, the parser is invoked in a manner that only checks for well-formedness.
However, this is not what the code actually does. Instead, it will validate any XML document that contains a namespace regardless of whether or not there is a schema or dtd.
Here is a example that demonstrates the problem:
<examplexmlns="http://example.com">
<text>This is an example</text>
</example>
❯ ./jhove -m XML-hul example.xml
Jhove (Rel. 1.24.1, 2020-03-16)
Date: 2021-06-24 07:41:51 CDT
RepresentationInformation: example.xml
ReportingModule: XML-hul, Rel. 1.5.1 (2019-04-17)
LastModified: 2021-06-24 07:41:36 CDT
Size: 82
Format: XML
Version: 1.0
Status: Well-Formed, but not valid
SignatureMatches:
XML-hul
ErrorMessage: SaxParseException: cvc-elt.1: Cannot find the declaration of element 'example'.: Line = 1, Column = 37
ID: XML-HUL-1
MIMEtype: text/xml
XMLMetadata:
Parser: org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser
Encoding: UTF-8
Schemas:
Schema:
NamespaceURI: http://example.com
SchemaLocation:
Root: example
Namespaces:
Namespace:
Prefix:
URI: http://example.com
The expected behavior is that JHOVE only checks for well-formedness and does not validate the xml.
The text was updated successfully, but these errors were encountered:
Hi Peter - are you anticipating that the result should just read "well formed"? Or is there something else, e.g. "well-formed, validity not checked", i.e. not to imply validity was checked?
As far as I can tell, these are the possible outcomes:
Well-Formed
Well-Formed and valid
Well-Formed, but not valid
Not well-formed
I believe that covers all possible outcomes, and I do not think a new status needs to be introduced. Well-Formed implies Well-Formed, validity not checked.
The documentation states:
However, this is not what the code actually does. Instead, it will validate any XML document that contains a namespace regardless of whether or not there is a schema or dtd.
Here is a example that demonstrates the problem:
The expected behavior is that JHOVE only checks for well-formedness and does not validate the xml.
The text was updated successfully, but these errors were encountered: